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ABSTRACT 

An- authomatic abstracting system, named ADAM, has been implemented on 
the IBM 370. AD:\M receives journal articles as input and produces abstracts 
as output. An- algorithm* has been developed which considers every sentence 
in the input text and rejects sentences which are nou suitable for inclusion 
in the abstract. All sentences which are not rejected are included in the 
set of sentences which are candidates for inclusion/in the abstract. 

The quality of the abstracts can be evaluated by means of a two-step 
evaluation procedure. The first step^of the procedure determines the con- 
formity of the abstracts, to the defined- criteria for an acceptable abstract 

as 

for the given system. The second step provides an objective evaluation 
criterion for abstract quality based on a jcomparison of the abstract with 
its parent document. The second step is based on the assumption that an 
abstract should present the maximum amount of data from the parent document 
in the minimum amount of length. 

Based on the results of this evaluation, several techniques have been 
developed to improve the quality of the abstracts. These procedures modify 
the form, arrangement, and- content of the sentences selected for the ab- 
stract. The revision,, deletion, or creation of sentences is performed 
according to a number of generalized rules which are based on the structur*^! 
characteristics of the sentences^. This modification produces abstracts in 
which the flow of ideas is improved and which represent a more nearly coher- 
ent whole. The abstracts also show improvement according to the objective 
evaluation criteria; 
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CHAPTER I. INTRODUCTION 

. / ■ 

/ 1. The Need for Abstracting 

^One of the diseases of this age is the multiplicity of books; 
. they /doth so overcharge the -world that it is not able to 

digest the abundance of idle matter that is every day hatched 
and' brought forth into the world. 

Barnaby Rich, 1600 
The disease of 1600 caused tiy^a multiplicity of books has turned 
into. an epidemic in the 20th century casued by a multiplicity of books, 
journals, monographs, newspapers, reprints, preprints, and xerox 
copies. The world today is inundated with a barrage of printed messages 
representing the ever increasing amounts of scholarly, and not-so- 
• " scholarly, research. We live in an era of rapid dlscover.y and far- 
reaching exploration and. we have come to- expect .that' the results of. 
t these endeavors will be recorded and published for future use. The 

problem that arises from the abundance of publications stems f.rom our 

s . * 

efforts to digest and organize the published ideas. 

The ability of individual human- beings to know and understand a31 
kinds of data has- not^^kept pace with .the amount of data that can be 
known. The volume of recorded data has increased because it" results 
from the collective achievements of millions of individuals, but the 
mental capacity of any one* individual has remained almost constant, 
throughout the years. Any one individual can Know "blily--a-~fraatioa.„oi___ 
the total available data; therefore, his knowledge of some data must be 
limited to a few key ideas. These key ideas represent an abstraction 
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of the'total data available. There is thus an inherent need for 
abstraction. Without it the collective knowledge of mankind would 
remain virtually constant at a very^primative level. But it should be 
noted that although abstracting itself is a necessity, the form of the 

o 

abstract may vary. • v 

2 . The Need for Abstracts in the Scientific Community 

In the scientific community, the problem created by the increase 
in the amount of available data seems especially acute. No one 
researcher is able to keep up with the vast amount of published research 
results. He must specialize and limit the number of areas of researclj 
about which he wishes to be well informed. He must choose only a few^ 
articles that "he wishes to stxidy thoroughly. Abstracts can help the . 
scientist Identify those publications which have the greatest potential 
value and therefore warrant his attention. The purpose of abstracts in 
technical literature is to -facilitate quick and accurate identification, 
of the topics of published papers. Their objective is to save a 
prospective reader time and effort in finding useful data in a given 
article or report (1). Abstracts can play a vital role both in coping 
wj.th the influx of new data, and in searching for older data (2). 

While abstracts have proved to be "valuable aids to the researcher, 

the tremendous increase' in the amount of published research which they 

'J 

ar^ supposed to ameliorate has caused problems in their productrpn. 
Most abstracting services rely on subject: experts to read the documents 
and to write original abstracts. This .method of abstract production 
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becomes more cumbersome with each Increase in the volume of documents 
'to be abstracted. Such increases demand more experienced abstractors, 

jmore facilities and more efficient methods of handling all phases of 

i 

j abstract* production. It becomes more difficult to maintain a uniform 
/ level of 'quality and exhaustive coverage of any given field. These 
problems are usually most clearly reflected in periodic rises in the 
cost of a supscription to the abstract journal. One of the appealing- 
possible solutions to these problems is the development of computer 
systems to abstract documents. 



3 . The Production of Abstracts by Computer^ » 

Perhaps the greatest obstacle to the complete automation of the 
process of abstracting is our lack of 'understanding of how .human beings ^ 
are able to abstract documents. We haVe very little knowledge of the 
method of selection of .certain ideas to be included in the abstract and 
of the linguistic processes -used in expressing those "ideas in a coherent 
abstract. There must be an increased understanding of the methods, of 
manual production of abstracts in order to improve the quality of 
computer produced abstracts. As Louise Schultz states, "To assign to 
a machine the tasks of prqcessing language requires enough knowledge 
,ot language to permit design' of the processes and evaluatiop, of , the 
processing." (3) Greater understanding of linguistics will aid in the 
development of computer produced abstracts and the development of 
computer produced abstracts will contribute to a greater understanding 
of linguistics. The study and formalization of the intellectual task 



ERLC 



of reading a document and writing an abstract wil'l provide some specific 
algorithms that may be useful in modeling the way. human beings perform 
the same tasks. 

The production of, abstracts by computer is also of interest because 
•it represents the application of many computer and information science 
techniques<rto the solution of a spe'cific problem. For example, the 
input of the text and dictionary constitutes a problem of large-file 
'\ ' handling and storage allocation. The matching. of the dictionary with the 

i text constitutes a problem of dictionary look-up. The expression of 
the abstracting algorithm as a Turing machine constitutes a problem of 
Tormal language theory. The development of the abstracting algorithm 
constitutes a 'problem of semantic and syntactic analysis. The , 
evaluation of the abstracts produced constitutes a problem of information 
theory. , < • 

4.. The Scope of the Dissertation 

"The ultimate goal/' according to Wyllys, "of research in auto- 
matic abstracting i^ to enable a computer pxogram to* 'read' a document 
and to 'write* an abstract in conventional pro&e style, but the path 
to this goal is full of yet unconquered obstacles." (4) The goal of 
this dissertation is to gain a greater understanding of the processes 
needed to enable a computer program to 'read' a document and to 'write* 
an abstract. This study is designed to reflect an interest in the 
production of high quality abstracts and in the development of a work- 
able computer-based production system. 
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The results of my resea7;ch are reported in the four succeeding 
chapters. Chapter II provides background for the development of 
abstracting services and a review of previous research in automatic 
aljstracting. Chapter III consists of a description of' the existing 
abstracting j)rograms . Chapter IV describes an' objective criterion for 
the evaluation of the quality of computer produced abstracts. Chapter 
V presents methods for improving the abstracting system, including 
revision and creation of sentences to be included in the abstract, as 
well as directions for future research. 
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CHAPTER II. BACKGROUND AND REVIEW OF PREVIOUS WORK 

1. The Nature and Definition of Abstracting 
1. 1 Introduction 

A discussion of the concepts "abstract" and "abstracting" is 
fraught with difficulty because^ the data upon which one must draw qover 
such a great span of time and because the concepts have until very 
recently been part of a.venerable^ empirical science. Thus, while the 
term "abstract" implies -some kind of reduced form of a corpus of things, 
the nature of the reduction is poorly defined. And while the term 
"abstracting" implies a method of reduction, the known methods are all 
purely descriptive. Let us consider briefly the nature and significance 
of the traditional use of these terms. 

The idea of abstraction pervades che ,whole of science, and the term 
suggests a transition from specific to general, from individual 
observations to class description. Thus molecular structural represent- 
ation of chemical substances is an abstraction based on the detailed 
examination of many, but by no means all, individual chemical species. 
A description of some piece of research is an abstraction of the actual • 
research. We might say therefore that an abstract ip the quintessence 
of that from which it derives; that an abstract is a document shorn of 

detail. This notion is most of den expressed by saying that an abstract 

/ 

^ The material in this Chapter will appear in the Encyclopedia of 
Computer Science and Teclinolo^^ ;y (1) under the joint authorship of 
my adviser and myself. / 
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is a condensed or abbreviated version of a document. ^ , « 

4 

When the tern "abstract" is used alone it seems ambiguous ♦ It may 

refer to the notion of an abstract property or expression ('Vorld War 

II" is a concrete expression, while the word "war" is abstract). It 

may refer to title to real property (In this sense an abstract is a 

written, ponnected, chronological summary of the essential portions of 
•# 

^ all recorded documents and facts which can be discovered by a* search* of 
the public records of the jurisdiccion within which the realty is 
located (2)), Or it may refer to a summary of the principal findings 
of the work reported in a paper (3), We are concerned here with the 

\ • . 

last of these three sens& of "abstract," but' it should be'^obvious that 

- . / 

the three senses are really quite similar in intent, the unity of 
purpose being the representation of the essential qualities of the 
thing (s) absnracted. 

4 

1 . 2 • Historical Development of Abstracting ^ 

In tracing -the development of information storage and retrieval, 
it is important to ask: What is the purpose of creating records of 
* man's experience, of^ abstracting, indexing, consolidating and distilling 
"the contents of these records, of providing repositories for their 
housing and for central .public access thereto? As Libbey (4) puts it, 



it 



This is a very loose description and does not seem to be 'entirely 
compatibly with the notion expressed in the preceding sentence* This 
point will be dealt with later when the term "abstract" is given a 
more formal definition. 
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Very little, advance in culture could be* made, even by the' 
greatest man of genius, ^if he were depend^t ^for what 
knowledge he might acquire u^5on his own personal obsefryations . 
Indeed, it might be said that exceptional mental; ability ^ 
involves a power -to absorb the ideas of others, and .even that 
the most original people are those who are able to borrow most 
freely. 

Without the availability, to present generations, of recbrds of the 
experiences of those who came before, one would have no foundatibr. 
upon which to build; one would have always to begin afresh. This 
sharing of experiences is at the heart of the need for afid interest in 



personal communication of any'lsoj^t. In facDj. we can Gay that communicat 
ion is the sharing of experiences. But most of us will never have the 
.^opportunity for personal communication with many of those individuals 
whose /experiences we would most like to share (even i& we knew of the 
exlstenc^ of such individuals). But records of the .experiences of 
these individuals are, fortunately, made available to us and we can 
thus "commune** J?ith persons loftg since de W-^ tf^'Qtherwise separated from 
us. This* is the fundamental reason for the existence of libraries, of 



what](*ver form. 

• • ' ^ • \ 

Concerted efforC fro br^!i>xig3scientif ic inquiry to an organized scate 
seems to ha^ve? had ^rts genesis in the 17th century. According to .some, 
Bacon (5) was most instrumental in bringing about this shaping and 
directing of human ac^nLyity. As a result of Bacon^s efforts to cause 
the establishment of reseaK:h as a means of regenerating learning, and 
to found a colle'ge of research for the purpose of fostering the New 
Philosophy^ (the scientific, method) and of providing for the publication 
of Such discoveries which the research revealed, the Royal Society 
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(London) was eventually founded, and subsequently the. whole fabric of 
European -scientific societies was established. The Royal Society may 
be credited with initiating the trend for scientists to publish the 
results of their research. l*ho journals whex:e individuals first 
transmit these results have come to be known as the primary literature. 

0 The^Royal Society was founded in ^1662 and proceedings of its 

3 : * 

meetings were first published in 1665 . But it was not until the 19th 

century, almost 200 years after the first serial publication appeared,. 

that the collection of recorded data had grown to proportions which- i 

made it desirable- to collect and summarize these data on a regular 

basis/ In 1821, Betr melius began his Jahresberichte u^^ e r die Fortschritte 

der physischeri^ >Wissenscha£ten .:> yBut Berzeliiis' effort was proceeded by 

a pharmaceutical ''yearbook" which had been st rated in 1795 with a 

similar purpose. However these publications' were yearbooks in the 

'sense of being* annual summaries gather than more frequently appearing 

periodicals. .But as the number of Articles' increased, the need for 

moire frequently published summaries or absLtacts was felt.^/ 

Pharmaceutisches CentralBlatt was begun in 183P to satisfy this need 

and, as Figure 2.1 demonstrates, the trend "toward more, more frequent, 

and more -specialized .abstracting publications has resulted in the 

concurrent appearance of a^Aarge number of them which pr'esumably^serve 

to make available the record of some ^portion of man's experience to 



.3 ... . 

- The Philosophical Transactions . , - ^ , ' 

Berlinisches Jahrbuch. fur die Pharmacie und die damit verbunden 
Wissenschaf ten (discontinued ^in 1840). - . ^ 
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modern scientists. In fact it is fashionable to illustrate the .parallel 

growth of the primary journal literature and .the secondary (abstract) 

journal literature by means of a graph such as that pf Figure 2.1. The 

parallel is striking, but the Figure raises the more interesting question: 
D ... 

When the number of abstract journals becomes too large, what new 
' representation^ of man's experience will assert itself, say around the 
year 1960? In other words, is there pow a third level of experiential ^ 
representation that is rising in importance and that will one day appear 
as a third "curve" on a graph such as that of Figure 2.2? 

1 •*3 . Intensional ProperrUies of Abstracts 

A handbook for authors (7) admonishes an author to be aware of the 

"importance taken on by his abstract." 

The id'>al abstract will state briefly the problem, or purpose- 
of the research when that information is not adequately^ contained 
in the title, indicate the theorethical or experimental plan 
used, accurately summarize the principal findings, and point 
out major conclusions. The author should keep in mind'" the 
purpose of the abstract, which is to allow the reader to 
determine what kind of information is in a given paper and to 
point out key features for use in indexing and eventual 
retrieval. It is never intended that the abstract substitute " . 
for the original article,, but it must contain sufficient 
> information to allow a reader to ascertain his interest. 

The abstract should provide adequate data for the generation 
of index entries concerning both the kind of information 
present and the key compounds reported. ^ 

Chemical Abstracts Service's Directions for Abstractors (8) states that 



I realiz^e" that one must interpret this term very broadly if all 
thos^, whom I might better call students, are to be included in 
the ranlcs of those who have need of and use recorded knowledge. 
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Figure 2.2 Third level of experentiai representation hypothesized to 
parallel the growth in primarvaita^ secondary literature - 
(after Figure 2.1). , '"^"^^ 



CA publishes informative abstracts which contain the. significant 
content of published works . ^ A CA informative abstract is a 
concise rendition of the significant content of a bibliographical 
^ ly cited paper or report vjhich p^rovides enough of the new 

information contained in the work with sufficient abbreviated 
details to enSbfe a reader to determine If it is necessary to 
cortsult the complete work» 

This description of an "informative abstract" leaves a great deal to 

the discretion of the abstractor, -but the publication also gives 

additional guidance on significant content: 

The following components are considered to be significant in 
the contents of an article and are included in CA informative 
abstracts : 

- The purpose and scope of the work, if it is not evident 
from the title. 

- New reactions, compounds, m'aterials^ techniques, procedur%^/, 
^apparatus, data, concepts, and theories. y 

- New applications of established knowledge, • " ^ J 

- The results of the investigation. * 

- The author's interpretation of the results and his 
conclusions derived from them. 

V 

Wyllys (9) has provided a* more succinct, it not m<5ire precise, 
descriptiqn of the term "abstract." 
An abstract is 

a) a description of, or restatement of, the essential content 
of a document 

b) whicfi'^ is 'phrased in complete sentences (e>:cept for 
bibliographic data) ». 

c) and which usually has a length in the range from 50 to 
500 words. 

Such descriptions of abstracts as those given above,^ although 
wanting in definitional precision, do provide a basis for the 
formulation of a definition. Thus, the abstract, should consist of 
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complete 'sentences rather than keywords or phrases, or, association maps 
(10). The abstract should (usually) be short, altho'ugh the relation 
between the length of the article and its abstract is prescribed in 
m^y different ways (11). The essential content of the article as 
reflected in the purpose of the work, the results and conclusions, new 
data, etc., should be included in the abstract. The problem with this 
last statement is tha^of determining what is the ''essential content^*; 
it is a problem central to all of language processing. 

There are two ways of v.iewing an abstract, as a structural element 
to be formed and manipulated or as an intensional element. Most, if 
not all, current attempts at a definition, of the term "abstract" mix 
these two views and consequently fail to provide a definition. 

. The following definition of an abstract (12), based^upon structural 
considerations, is offered which provides a workable and useful basis 
for a better understanding of the abstracting process (but, as will be 
seen, it is incomplete in the form given h^re). 

An abstract. A, is a set of sentences, s, such that 

A={s|seD} 
6 

where D is the document abstracted, and such that certain transformat- 
ions on the set, s, are allowed: 

concatenation 

truncation 

» 

phrase deletion 



Document , as well as other fundamental terms, are defined in (13). 
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* . voice transformation' ' 

paraphrase 
division 
word deletion 

What does this definition dp for us? First, it provides a more precise 

means of determining whether a document is an abstract, than do earlier 

descriptions. Thus, one of its most important advantages is that' it 

7 

distinguishes abstracts from critical reviews or otfier lit^afy 
inventions" of the abstractor. Second, it makes no mention of ''content" 
or "liieaning,** notions which are difficult to deal with and whicfi may 
vary with the purpose of an abstract. Thus, the definition distinguishes 
between what an abstract and what an abstract does . Third, the 
deffnition allows for a certain stylistic freedom,, but stylistic 
freedom does not encompass editorial comntant (the latter is precluded 
by the definition). Finally, the definition is applicable to human, as 
well as to other, abstractors, thus permitting comparison of the 
abstracts produced by'different abstracting systems. 

It should be noted that the definition given above does not 
prescribe the length pf an abstract. Length is function of the 
purpose of an abstract and is therefore not of the essence of the 
concept "abstract.** Abstract ^length* is a variable under the control of • 
the abstracting system rather than one whose values are dictated by the * 



An author of a critical review includes added interpretation and 
criticism of the article that he is reviewing. , 
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definition of the product of the system. ) 

While the definition p^rovided above is useful, it is really 

incomplete because the intension of an abstract is not provided for. 

Abstracts are usually piaced into one or the other of two intenslonal 

classes: informative or indicative . An informative abstract is one 

characterized as containing some (or all) .of <the data contained in the 

original document. An indicative abstract is' one which indicates that 

certain data is contained in the original document, but that data is 

8 

not contained in the abstract. Examples of informative and indicative 
abstracts are. given in Figure 2,3. Many variations on these two classes 
of abstract have been described (14).* In practice, most abstracts 
prove to be both infortnative and indicative, ^so it is perhaps less 
important to consider abstracts as belonging- to one or the otfier of 
these classes than it is to consider the user population they are to 
serve.' 

To know the needs and desires of a system's user population is no 

easy accomplishment. But if I assume such knowledge for existing 

abstracting services, I can observe that Chemical Abstracts ^ for 

example, g'ives greater emphasis to current, work in chemistry than to 

articles relating historical data (state-of-the-art reviews, books, 

etc.) because articles of the first type are represented, by informative 

abstracts while the latter type are represented by indicative abstracts. 

This observation leads to the inference that the users of Chemical 

* 

g 

Hence an informative abstract is also" indicative, but the inverse of 
this statement is not true. 
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ABC is judged a more satisfactory process for the water- 
proofing of synthetic textiles than XYZ. The ABC process 
yi'elds a product of 20 percent greater durability as judged 
by standard test #1234. It also , yields a better appeariv\ig" , 
product based on the votes of a panel of 25 texniTe^ finish- ! 
ing specialists. The^cost of • ABC is claimed to be 10 per 
cent less per square yard although specif it: cost data are \ 
not given. The two processes, are about equal in processing ' 
speed. 



The finishi^ng- of textiles by process ABC to achieve water 
repellency is considered superior ,to finishing by process 
XYZ. Factors considered include durability, appearance, 
cost, and speed. 
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Figure 2.3 Examples of informative (top) and indicative (bottom) 

abstracts. (Reproduced from Abstracting Scientific and 
Technical, Literature (10) .) 
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Abs'^tracts are, on the whole, more concerned with current work than wich 
historical data. 

These observations lead me to the conclusion that the purpose 
(which I have inferred from the examination of the products of various 
abstracting systems) of the abstract dictates the particular set of 
sentences which constitute the abstract. The purpose of an abstract is 
met by controlling the intension of the abstract, thus, a chemical 
compound appearing in an abstract in Index Chemicus will almost 
invariably Qarry with it the intension of "newness , " while a compound 
appearing in Biological Abstracts will most likely bear the intension . 
of application in some biological system, regardless of its "newness." 
I see, therefore, that .the structure of the abstract and the intension 
of the abstract are not independent^ so the definition given earlier 
must be modified to account for the relationship between structure and 
intension. 

1.4. Formal Definition of "Abstract" ' 

Let me rephrase the definition of abstract given earlier, <this 
time in the language of automata, theory. The definition of an ^abstract 
can be given in terms of an automaton. Ma, denoted 

\r I' "^"^ ^^^^ 

where - , 

IC' is a finite set of states. 

r is a finite set of allowable input and output symbols: the 
original document, the abstract, and any additional data. 
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£ is the set of input symbols, i,e, , the original document. 
6 is the next move function which is defined by machine 

configurations, selection rules, and transformations. 

This function also specifies the output of the automaton, 

the elements of the abstract, 
qo is the start 'state, which corresponds to th^'" input of the 

title. ^ 
F is V the final state, which corresponds to the completion 

of the abstract. " 



Alfhough the above definition of "ab^ract" may seem somewhat 



abstruse, it is really an operational definitiotK ^hi-s^d_ef initibn 

provides the relationship between the abstract and the original'^*-* 

document in terms of an abstracting algorithm* The set of states and 
associated mappings constitute an algorithm which is a realization of 
the intention of the machine. The machine ^can be given explicit 

\ / 

definition for a particular abstracting sj^lotem by specifying all the 
parameters necessartry for operation of the tjutomaton. 

Based on the values of certain parameters, various types of 
abstracts c'an be defined. When the set of allowable input and output 
symbols', T, contains only the sentences of an original document then 
the resultant abstract is a' selection of sentences from the original, 
or an extract s When the set T contains, additional symbols, such as 
alternative sentence structures and vocabulary items, which allow for 

V . 

the abstract to contain, paraphrases of the qriginal sentences, then 
an informative abstract is produced. When information about the 



original document is supplied by the abstracting system, i\ e , ^ when 
s uch^i j|iformation can serve as input to the automaton, then an 
- vindicative abstract is produced. Other types of abstracts that have 
been ide'ntlfied by authors, for example, alerting abstracts- (11) and 
critical abstracts (14), reflect the orientation of the information 
that is added to the set of allowable input symbols. By completely 
specifying all of the parameters in the formulation, an abstract will 
•be completely defined and an algorithm for production of the abstract 
will also be available. 

I have dealt with the concept of an abstract, considering in*' 
particular , . 

1» what an abstract is, 

2. the purpose, of an abstract, and 

3. to some extent how an abstract may be produced. 

In the following sections, I consider the last point in greater 
detail. In jjener'^al, an abstract is produced by an abstractor or ^n 
abstracting system. This? latter, term is more general' (the former term 
seems to imply a human as the abstracting system) and can be applied 
xjith equal ease to humans or other kinds of machines -which are capable 
of executing an abstracting algorithm. In the sequel I'will briefly 
consider, human abstracting systems and then discuss computer-based 
systems. 
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2. Methods of Producing Abstracts 

2.1. Human Abstracting Systems 

Although automatic abstracting is a popular current copic, all 

operational abstracting systems are largely human abstracting systems . 

That is, the reading of the original document, selection of portions 

of it for the abstract, the writing of the abstract and, in many * 

instances, its formatting and editing, are processes performed by 

humans. The 'most important of these processes, from the view-point of 

the. abstract user; is the selection of material from the original 

document for inclusion in the abstr^ict. But this selection may be 

9 ■ ■ 

influenced considerably by the other processes. « 

The process of translation from one language to another, as carried 
out by human abstractors, may result in considerable 'variation in the 
size and quality of the abstract. Such variations depend largely upon 
the abstractor's facility with both the source language and the target 
language. Although no data are* available to support this contention, 
it seems unlikely that '',n abstract prepared by an abstractor, whose 
facility with one of the languages is poor, will be of as good quality 
(a^s amplified below) as that prepared by one fluent in that language. 



A good deal of the discussion to follow is of necessity conjecture, 
but It should not be particularly difficult '(however time consuming) 
to obtain data with which to affirm or deny the ass,ertions made 
here. These* studies would of ccarse,* involve human subjects and 
unless the"^ experiments were carefully designed and executed, the 
results would be of doubtful value. 



\ 




N. 



23 

The fault is most seridus when the source language is not well understood 
by the abstractor. Then the abstr'act may be weJJ^^ritten* but ruay not 
represent the content of the original document. On the other hand, , 
,when the target language is the source of , difficulty the abstract will 
shqw^ this and the reader will be altered to the pcftsibility for error 
or misinterpretation" on the part of the abstractor. 

"A second factor which influences the Selection of material for the ^ 
abstract is the abstractor's knowledge of the subject area of the docu-- 

meht being abstracted. The abstractor who is expert in the subject 

* ' • / 
area will likely produce an abstract which is shorter, more 'general 

and -whicji re,quires' more knowledge^ on the part of the reader than an 

abstract produced* by an abstractor ;,whose knowledge of the subject area 

is marginal^ On the other hand, the* latter type..of abstractor is 

perhaps less likely to get -the main thrust of the document into his ' 

• . 
abstract "than the subjec't expert. What is probably needed is an 

abstractor whose qualifications, lie between th'e extremes of expertise 

and passing knowledge in a subject area. Again, there are n9 data to 

support these contentions, bjit the following statement from an editor 

o£ ChemicaT Abstracts . (16) lends some credence to them: * 

...the best way *to get all of Che new and significant information 
of a paper into an abstract expressed in proper technical* terms, 
is -to obtain the spare-time^ abstracting service of someone 
actively interested and working in the specific field of chemistry 
into which a paper being^abstracted fits. • • 

The suggestion is that it is better (and easier) to t^^^ subject. 

specialists how to abstract rather than to -tCempt to teach abstiractors 

a subject specialty. 
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Another factor which must influence selection of -mater i'al for the 
abstract-^'S— t-he--impo''sition of formatting, anc} editorial, responsibilities 
upon the abstractor. The more mechanical tasks the abstractor must 
perform, the greater is the likelihood that »the quality of the abstract 
^will suffer. ^ Undoubtedly, Zipf 's law of "least effort (17) comes into 
play here. * • ... 

Thus it is suggested that a :human abstractor will produce the best 
/jtbs tracts who ) ^ * ' l\ ^jsa^ 

^ 1. is. expert in the subjec" an-iii, and is taught how 'to pi^epare 
' abstracts ' ^ • * 

2. , is fluent in both^ the source and t^arget languages * 

3. is, subjected to as few purely ^mechani^cal' tasks, as. possible. 
Although these are not the only factors w|iicH influence the quality q]r 
human-produced abstracts, they, are probably the'most influential. 
These and other factors' are considered again later in connection with 
studies of • absti;actor consistency. ' ' • . . t : 
2.1.1. Operational Systems i * * . ^ . 

Computer systems are presently being utilized ^in operational 
abstracting sy|.tems to speed up t;K- production of abstract iournals . 
The basic prqcesses in the productiovi of abstract publications can be ' 
defiried in terms of the following five steps: 1') docj^'meht'selectioh 
^and^ acquisitibny 2) input processing, 3) abstracting,.^) publication, 
and 5) announcement and' distribution ^(see Figure 2.4)\ 

The ^acquisition aad selection o't documents to be abstracted is * 

^ ' \ i . ' ' • 

essentially a manual operation at present. Most primary sources of 
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information are received as 'hard copy and-'fhust be converted into » 
machine-readable form for'-input jto' the Computer system. But current 
urends in information transfer suggest that increasing reliance will 
be placed on the^exchange of information in machine-readable form. 
Through use^bf machine-readable records the process of producing 
secondary publications can become an integral part of other processes 
involved in information-transfer technologies ♦ Publishers of primary 
journals, for instance, may be able to provide abstracting services 
with the machine-readable data used to typeset the original article. 

Source-data preparation j or input processing, today is largely 
manual and consists basically, of keyboarding punched *paper tapes, 
punched cards, magnetic tapes, or disks. This input can then be 
edited •using display-and-edit. programs available through cathode-ray- 
'tube temninals. Increased use of optical character readers may also 
alleviate the need for transcription from hard copy to machine readable 



form. Furthermore, the availability oB> primary ^sources in machine-- 
readable form may eliminate the 'nee^T'Sor input processing in the 
foreseeable future ♦ p 

Abstracting is currently a manual operation in all production 
environments. Subject experts are employed who hav,e been trained to* 
write abstracts. In order to meet the recognized need for qualified\ 
personnel to write abstracts, some journals now require that authors 
submit^abstracts in addition to their articles. This is not a totally 
adequate solution because the author-generated abstracts are 
inconsistent with respect to style and coverage and do not, frequently. 
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reflect the point of view of the abstracting service. Although lie'search j 
efforts appear promising, abstracts which are produced by compute^ can 
not yet meet the demands of abstract- journal publications. The 
abstracting operation itself will probably be the last to be fully 
automated in the production of abstract-s. 

Many abstract publications rely heavily on automatic data processing 
equipment foi>'production of their ' journals . Seco'ndary publications, 
such -as Index Chemicus , Biological Abstracts , Psychological Abstracts , ' 
Index Medicus , and others like them, have pioneered in the use o£ 
computer-aided composition and typesetting. In most applications, the 
information handling capabilities of computers in printing have been 
used primarily for right-margin justification, syllabifipation, and page 
composition or for instructing hot metal or photocomposition machines 
and teletransmission devices. Such modern typographic and printing ' 
tools as photo-composition, xerographic processes, and page composition, 
will certainly continue to make printing faster, if not more 
economical '(18). ' * 

Announcement and distribution of se^condary services is being 
enchanced by the deveTopment and use of computer-based selective- 



dissemination-of-information systems. The selective dissemination of 
information to users is made on the basis of user profiles, each of 
which is a compilation of keywords reflecting a user's interests. The 
user is then presented with only those abstracts which have matched^ his 
profile. The goal of sucl/systems is to ^sent to the user only the 
abstracts which are of the greatest potential value to him. 
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, Chemical Abstracts Service , long a leader -in improvements in 
abstract, journal publication, recognized in the early 1960*s that the 
traditional manu<:,l system for processing and publishing secondary 
chemical inforipat iorf waV too slow, too expensive, too rigid, and too 
-wasteful in its use of highly capable manpower to be effective in the 
face of the ever growing volume of published information. A stream- 
1-ined ^system was designed in which each primary dbcument would be 
arfalyzed only once and ^the selected data recorded only once in a form 
that 'Could be used to produce a variety of information packages with 
overlapping (eontent. Th«;otarget^ system designed by the Chemical 
Abstracts Service for their operations-^ combines human intellectual 
analysis and computer-based processing. The roles of the computer in 
the system are: . 

1. To receive material derived by hlaman intellectual analysis; " 

2. To support that analysis with machine aids and augment the 
information flow by retrieving related previous work; 

3. To apply automated validation checks and ttr'igger exception 
reviews by editorial staff; 

4'. To eliminate the necessity for manual bridging between proces- 
sing steps; 

, 5. To automate the ordering (sorting) and formatting of the 
information, both on a data-directed basis; 
6. To control composition machinery; and 



10 ^ ^ ^ 

Other publications, such as Physics Abstracts , Biological Abstracts , 

and Psychological Abstracts are also moving toward tMe incorporation 

of sophisticated man/machine systems in their production.* 
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7. To provide computer-r^'adable files. 
This target system, which was initiated during the 1967-68 period and 
will probably not be completed^ until the 1975-76 period, represents a 
system which uses computer .technology as an essential and 'integral 
component (19) . 

2.1.2. Selection and Consistency 

Before any attempt is made to automate the process of abstracting, 
it is important to understand how humans produce abstracts. A number 
of studies have been reported which deal with the question: How does 
a human abstractor decide what material in the original document should 
be included in the abstract? A corollary question is: Does the 
abstractor display any consi^stency in the selection process 



a. Within a document; 

b. Between documents; 




c. Through time tor either a. or b.? 
In order to answer these x^uestions, one needs to know 

1'. What constitutes a gpod abstract of a document (or -what 
-constitutes a good set of abstracts, if one allows for the 
possibility of several equally good, goal-directed abstracts)^; 
2. What is the operational definition of an abstract -used by a 
particular abstracting system, as embodied in the rules for 
abstracting prescribed (or described) by the system^"^ and 

The directions for preparing author abstracts, issued by a particular 
journal constitute an operational definition (no matter how vague) 
of an abstract. 
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how does it contrast with the definition of^a "good" abstract; 
3.. Given answers to 1 and/or 2, what consistency does an 

abstractor display in following the rules of the abstracting 
system or in producing a good abstract (if the rules of the 
abstracting system seem, or are believed to be, at odds with 

j the abstractors' understanding of what constitutes a good 

I 

; abstract) . 

i 

These three questions can be studied *afone or in pairs, as 
illustrated in Figure 2.5. Only the last of these questions has been 
studied systematically, yet there' is no clear answer to any of these 
questions', although some of the research which* has been done at least 
suggests the direption further efforts should take. Such studies have 
a special significance for the development of automatic abs*tract;lng 
systems, as will be made clear subsequently. 

The most 'significant studies on selection and consistency are thos 
of g^^th, Resnick and Savage (20, 21^ 22, 23) and of the Thonipson Ramo 
Wooldridge, tnc, (XRW) group headed by Edmundson and Wyllys (24, 25). 

The main coticlusion to be drawn from the 'Rath-Resnick-Savage 
studies is^that human abstractors show poor consistency in selecting 
sentences for the abstract both between abstractors and with respect 
to time. In an initial study (22), these workers assigned 6 human 
abstractors the task of reading a set of 10 articles from Scientific' 
American , choosing the 20 most representative sentences from each, 
article, and ranking' these sentences in decreasing order of representa- 
tiveness^ (measured^ against a background of those sentences already 
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5 Pairwise combinations of questions in the study of sentence 
V selection and consistency of abstractors. The questions to 

be studied are: Does the abstr^'ctor display any consistency 

in the selection process 

a. Within a document, ^ ' • 

b. Between documents, and 

c. Through time for either a or b?' 
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ranked). At the same time a computer program, 'similar to one developed^-^ 
by Luhn (26) (see Section 2.2.2.1) was used to select and rank 20 *' - 
sentences from each of the test articles. Five different methods of 

'12 

ranking sentences were used in the computer-based abstracting procedure. 

^1 <. 
One interesting point concerning what sentences were selected by human 

and computer was reported. It was found that the 10 articles taken as 

^ whole contained 37% "topic sentences" (after Baxendale (27.)). Of ^ 

these topic "sentences, humans, selected 47% and the computer selected -* 

337e. It was also found that human-selected s'entences came more often 

from the first half of the article, while the computer' made more 

13 

sentence selections from the latter half of the article. 

In a follow-up study, Resnick (38) showed that abstractors' differed 
ia their consistency in abstracting the' same article at fairly widely 
separated points in time. Five subjects were asked to abstract 6- 
articles from Scientific American , as in the previous study (23). 
Eight weeks later, they were asked to abstract the same six articles. 
Instructions included the admonition not to select sentences which had 
previously been selected unless the subjects felt the sentences were 
still representative, and to mark any currently-selected sentence they 



12 

o It is important to note that the sample sizes used in this experiment 
are much too small to yield statistically valid results, although 
the data do support what 'one would feel, intuitively, was true. 
Nevertheless, the questions- which this study attempted to answer 
remain open. 

This result may suggest that human abstractors get tired more readily 
than computers. ^Furthermore, the conditions under which the subject- 
set of human abstractors worked was. probably a poor approximation of 
the (often harried) conditions under which the professional abstractor 
works. 
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believed they had previously selected* The accuracy 'of these identi- 

^fi'cafions is Indicated in Table 2*1* On avei;age, these abstractors 

•-iigre able to correctly identify a currently-selected sentence as one 

. 14 

vhich had been selected preViously 42.5% of the time. 

, Table 2.2 gives data on the consistency in sentence selection over 
time. It can be seen that there is greater variation among abstractors 
than between articles for a given abstractor . One may conclude, 
from these studies (keeping in mind the.smallness of the samples), that 
human abstractors are modestly consistent in .producing abstracts of a* 
given document. 

In another study qf abstractor consistency, carried out by the 
TRW group (25), it was concluded that although human abstractors were 
not very consistent among themselves, the abstracts they produced weVe 
adequate (in terms of inter-abstractor consistency) * to justify a stuSy ^ 
of the attributes of the sentences they extracted from the document. 

The way in which the TRW group measured the consistency of human 
abstractors is of some interest^. A correlation coefficient was devised 
which attempted to measure the similarity between two sets cf sentences 



This result suggests that the subjects* memories functioned less 
well than their abstracting algorithms. 

Ic would be interesting to know the reason for the individual 
variations, which deviated widely from the average for a given 
abstractor. In Table 2^.2 abstractors 2 and 4 (particularly 4) show 
between-article variations which are inconsistent with these 
variations for the other subjects. 



Table 2 A Mean percentage of responses of five subjects making selections 
of 20 ^'representative** sentences two months apart from each of 
six articles. 'During the seconds selection the subjects-were 
asked to indicate whether or not they had chosen each sentence 
two jnonths earlier (from" (23)) . 





Sentence correctly 
identified 


Sentence incorrectly 
identified 


Sentence previously 
selected ' ^ 

» 


ft 

42 . 5% 


13.17c 

• 


Sentence not 
^previously' 
--selected 


\ 

21,7% ' 


\ . 22.7% 
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Table 2.2 Percentage of sentences selected on second trial .which were 
the same as those selected on the first trial (From (23).) 



Articles 

Abstractor A B C D ' E F 



1 


60% 


55% 


45% 


45% 


40% 


40% 


2 


45 


50 


75 


80 


45 


60 


3 


60 


. 65 


55 


70 


55 


50 


^ 4 


45 


55 


55 


' 55. 


40 


15 


5 


80 


70 


70 


60 


" 55 


80 ' 
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extracted from a given document, taking into account the sizes of the 
sentence sets relative to the original document and relative to each 
other. * This measure can be expressed as 

S. 

where L is the length of the sentence T (number of text tokens) and 



where 



s. = / P. 

1 

V w~Ft^ 
1" j 



w. 



M 
w. 
1 



4 



where N is the number of text tokens of .type w in the- sentences 

i 

extracted from a set of documents and M is the number of/text tokens 

of type w in the document set C^^N). The probability that a sentence 
i 

will be extracted given N, M', L and w, is q_,. This probability can 

be derived from a consideration of the concept of inductive probability 

(developed by Carnap (29)) as follows (25). 

Suppose two gamblers X and Y are told that a certain document has 
been extracted and that a particular sentence of that document has 
certain specified characteristics. These characteristics, together 
with all data available on past sentence selection by abstractors, are 
summarized in a proposition called the evidence, E. Let S be the 
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hypothesis that the particular- sentence has been extracted^\^Let the 
betting quotient, q, for a bet on S be the ratio * 

stake offered for a bet on ^N ,^ * * 
total stake * ^ 

Suppose the amount bet on S .is q and the amount bet against S is 1-q. 

Then one can assume the existence of some q*^ch that X would be 

equally willing to bet for or against S. Such a value of q is, as far 

as X is concerned, the psychologically fair betting quotient for S. 

V * 

relative to the evidence, E, available. Another way of expressing the 

"■ ■ • "1 

meaning of q is to say that q indicates the preference that X belie^ves 
E confers oh S. Y may be assumed also to accept either betting role 
for -a q having the above value. 

Ta determine whether q is indeed fair, a given document should 'be 

t ' 16 ' » • • 4> " */ 

abstracted "by u large number of abstractors. If X and Y" made a 

series of bets, X for S'with quotient q and Y against S with quotient 

1-q, then the total balance of wins and losses would be zero \f the 
^ \ 

ratio of to S^ , is exactly q. * 

true false ^ ^ 

The TRW group (24) concluded that a representativeness score foi^ 
a sentence under hypothesis S should incorporate: 

1) the degre^ of evidential support that' E confers on S; 

2) the fair betting quotient for S relative to E; 



16 " 

Such a determination is always hindered, irv practice ^ by the 
practical difficulties of carrying out a study involving 'large 
numbers of abstractors and/or large numbers of documents. 



7 
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2f) an" E-based estimate of the ratio » v. i 



true 



r,' ^ ^» ' • This measure is a' general^one because it permits the. evidence, E', " | 
} to (be; specified iri; any. desired^ manner. Thus, although the TRW study , , 



u'sed 'the measure as support' for Sentence selection b-ased' esserit-ialiy 
' ' ' « , ' ' . ' ' J • ^ ^ 

I * ' ^ ' * • ' t ^ . . \: . 

I ' on frequency criteria, .any Well-defined dritferia may -be used. The . 

Important ;point .iS that. if the fair bettin| quotient' q can be, 

determined, then the select;ion criteria may be incorporated in'-^a ' 

|jr6gram for automatic' abs.tracting,^ k , \ , ^ - " ' 



\ "■' ln> concluding this section, it is fair to -say that Ittti'e is: knovp'^:. ~; 

' about how,sOr..why human abstractors choose ffom the <?rLg.inal article^^ J, - }v ^ . 

what they include in *the abstracts which they produce. _ Neither is-4t , . \j ' 

clWr to what extent human abstractors a^re consistent in abstract - > '-f . ^ - . 



0 . : . 



production. Perhaps more ^importantly, especially since there as mo , * 

^ ' - - , ,f ^ . 

concrete answer to the que.stioh"of*",^hat cpjnfstitute's a goo^d' abstract, ■ " , < • l , . ♦ 

quest ioiiG .relating to ;humiin "select ip'h anci • consls;tency .may be - , " ^ ^ < 

' : " ' \^ ' . \ \ ^ J/ ") \ ^ . 

irrelevant. It i§ just possible .that the.^abstracts jpfcoduced <:by , ' . .^^ t 

- - ; . • ' J " ' p - 1 ' I ^ 

humans are pot good. If so*, -thenj -it, woul^d 'b^e^ undesirable to,*^try/to, \. . \ 

. ' ^ ■ 'V' ^ • ^-^^ '-'^^ ' 

emulate by comouter" the' processes whichi humans use in abstr%act'irii^, • f 

' ' ^. • /, '< ^ .r*. . ^ • ' ! 

since such em. ,*-**.ion wduld^. lead^ simply to a faster rate '6f ;pro^ductioji ^ • 



of cpnsist^^x^^TlT/^oor abr^trac'ts . "^"^-^ The ma jo?-, unansv;ered. ^question isr,^ U 



-A^'product -easy to-r.producG and difficult to sell. 
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'^^Uhaz makes aa: ah:> tract a 'good one?" 

i 

Z\i* rompucer-Based Abstracting Systems 

I turn now fco a discussion of abstracting systems in which the 

^:onipuC€hr plays a central role. This discussion must, like that for 

human abscracci.ig -systems , be tempered by a lack of knowledge of 

whc'ther che abstracts produced" by such systems ar* "good." Happily, 

however, the questions of selection and of consistency of selection 

aire easily dealt with. It is these two aspects of computer-based 

18 

abstracclng- systems that will be considered in greatest detail in 
chis section, 

19 * 

There are eight significant studies related to computer-based 
abstracting which have been described in the literature. These are 
*1. The Luhn study j 

« 

2, The ACSI-matic study conducted by IBM; 
. 3\ I'he Oswald study; 

4, Word-association research; 

5* Th^ TRW studies conducted by Edmundscn and Wyllys; 
The Earl study; 



18 " 

I prefer to call cntr-pccocess of abstracting by computer "computer- 
based abstracting**; the abstracting system a "com{)uter- based abstract 
Ing system**; anJ che abstract produced by such a system a **computer- 
produced abs tract T" rather than to use the term automatic in place 
of con^jputer-basgd as has commonly been done in the literature.^. ' 

To be sure, one could include other studies in this li-st, notably ' 
those of Baxendale (2/) and of Climenson, Hardwick and Jacobsbh <30), 
but I believe chqse listed to be representative of the various 
approaches- taken to the product-ion of abstracts by computer program, 
and chat they are also^ of some historical importance. 
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7. • The Soviet study; . « 

8. The Rush study. 

"l will discuss the first seven of these studies in the remaining sections 
of Chapter II and I will discuss the Rush study in Chapter III. Before 
1 discuss these studies, it will be useful to describe the basic 

c * » 

components of a computer-based abstracting system. 

2.2.1. Basic System Components * 

A computer-^b^ased abstracting, system (see Figure 2.4) must 
a. rea<^the document to be abstracted, 
analyze the document, 

c. apply a set ofv^selection and/or transformation rules to 
produce the abstract, 

d. format the resulting abstract, and 

-I # 

e. print the abstract. 

20''* 

Reading the original document is perhaps as difficult a task as' 

any subsequent processing steps the abstracting system must perform. 

A technical paper will commonly use many different characj>6rs in. 

.several different sizes and. In a half-dozen different fonts. The 

ordinary computer- system input devicesJ are not capable of handling 

such- a'wide range of characters, nor are there optical readers (optical 

21 - . 

scanning devices) available which can do so economically. One or 
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Readings obviously involves seeing, and there are a variety of 
methods for recording data so that a computer can "see" (sense) it. 
These methods are beyond the scope of this work, but the reader is 
Referred to (31,^32) for details and further references. 

21 

Except, perhaps oij a very large scale. 



both of two simple solutions to this problem are usually effected: 
1) pre-edit the text, eliminating or altering those portions of the 
• document which the input devices cannot handle; 2) use a scheme of 
flagging (coding) of characters which cannot be read directly. Figures 
tables and other graphic materials will in all probability not find 
their way into the computer system at all. Pre-editing is most/likely 
to be used, since the special text features which' might otherwise be 
preserved are of questionable value in the actual abstracting process 
and since, until the use of optical printers becomes wide-spread, 
computer printers cannot economically (if at all) print the necessary 
range of characters. 

Once the text has been read into computer ipemory, analysis of the 
input text is performed- The actual analytical methods used depend 
upon ^;the particular abstracting system, so a discussion of this 
component will be deferred until section 2.2.2.1. 

Similarly, rules for selecting parts of the olriginal document for 
inclusion in the abstract are considered under each of the specific 
studies which are discussed, beginning in section 2.2.2.1. 

Formatting the output of the abstracting process has not been 
particularly inspired. The usual methods include listing individual 
sentences or, occasionally, printing the abstract paragraphed as was 
the original document. Abstracts have never been not-eworthy examples 
of literary work nor of the typographic art, so one should not 
criticize the output of computer-based abstracting systems too 
severely (although it does not seem unreasonable to hope that a little 
imagination might be applied to the formatting of computer-produced 
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abstracts). No existing computer-based abstracting system has utilized 

22 

any output device other than the line-printer. 

2.2.2. Major Research Efforts in the Production of Abstracts 

With this- general view of the basic processes involved in computer 
based abstracting, I now turn to a discussion , of specific studies 
which have lead'^.to the implementation, at least experimentally, of 
computer-based abstracting systems. 
2.2.2.1. „ The Luhn St.udy 

Luhn is credited with having first suggested and demonstrated 
that abstracts could be produced via computers (26). The procedures 
employed by Luhn for generating abstracts by computer were as follows. 
^ a. The document to be abstracted was first punched into cards 
(texts which were used required no pre-editing) and then 
• transfered to magrietic tape. 

23 

b. The text was then read, word by word. Common words were 
deleted through table look-up. The remaining words, called 
content words , were associated with any punctuation that 



22 
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Certain abstract journals' are printed in a two-step process in the 
first of which an optical printer (COM, computer-onto-microf ilm 
processor) is used. These publications do not, however, result 
frpm the compilation of computer-produced abstracts. 

Common words might well be called non-substantive words, since thes( 
are words that are considered to have no value in determining the 
significance of a portion (sentence', paragraph, etc.) of text. 
Common should not be confused with function in reference to .word 
classification. 
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I 

preceded Cr followed them, and their exact location in the 
original document was noted. ' 

b 

c. The content words were then 'sorted into alphabetical order. 

/ , • 

d. / Words of vSimilar spelling were "consolidated" as follows. 

/ ' ' 24 . 

' Successive pairs of word-tokens were compared letter-by- 

/ * ^ - 

/ letter and at the first point -pt difference a count: was 
/ initiated of the number of non-mat;chss observed from that 
/ point to the end of the longer word-token. If this count 
was less than seven (<7), the word-tokens were taken to be 
of the same word-type (i.e. > to represent the same -notion)", 
otherwise the word-tokens were taken to be distinct word- 
types^ The frequency of occurrence of each word- type was 
then determined" and .word-types occurring less frequently 
Jthan jsome prescribed value were deleted.^ The remaining word- 
types were considered to be "significant". 

e. The significant word-types were thei? sorted into location 

m 

order 

t. Sentence representativeness was next determined^ Sentences 

A 

were divided into substrings each of .which was bounded by ' 
significant words separated by no more than four non-significant 
words. (Significant words separated from other significant 
words by more than -four words were called "isolated" words 



A word-token (sometimes text-token ) is a place-hotder in a text. 
Each different word-token is called a word- type (or text-type ) . 
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and were not given further consideration, ) F&« each sub- 
string, a representativeness value r^, was^calciilated accord- 
ing to the equation 

' ■ p. . • • 

' \ ' . ■ 

where .p rs the number of representative tokens io the cluster 
i 

and q. is the total number of tokens in the cluster. The 

highest r for a sentence is taken as its representativeness 

i ' N 

of document content. Sentences having a value of r above a 

i 

prescribed value (or else a predetermined number of sentences 
of highest r.) were selected for inclusion in the abstract, 
g.. The abstract was tlien printed (as a set of sentences), formatted 

in paragraph style. ; . . 
While the methods employed by Luhn for determining word and sentence 
significance have fallen somewhat into disrepute, the technique is 
clearly'of historical importance. An example of an abstract produced 
by this method is given in Figure 2.6. Additional -examples of abstracts 
produced by this method, asewell'as data on word representativeness 
and word-token counts may be found in Luhn's papers (26, 33, 34). 
2.2.2.2. The ACSI-Matic Study 

Tlie ACSI-Matic study, conducted by IBM (35, 36) for the Army 
Department's Assistant Chief of ^taff for jLntelligence, was (and is) 

the pnly study which lead to a computer-based abstracting system as 

/ 

part\of an operational information system. 
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Exhibit 1 je ' 

Bourct: The Scientific American, Voh 196, No. 2, 86-94, February, 1957 

Title: Messengers of the Nervous System ^ o 

. Author: Amodeo S. Marrazzi , 

Editor^s Suh-heading : The internal communication of the body is mediated by chemicals as ivell as by nerve 
impulses. Study of their interaction has developed important leads to the understanding and therapy of 
mental illness. ,o ^ 

Aato-Abeb>acl^ — ^Exhibit 1 , ^ ' 

It seems reasonable to credit the single-celled organisms also with a system of chemical communication by 
diffusion of stimulating substances through the cell, and these correspond to the chemical messengers (-e.g., 
hormones) that carhj stimuli from cell to cell in the more complex organisms. (7.0)** 

Finally, in the vertebrate animals there are special glands (e.g., the adrenals) for producing chemical mes-^ 
sengers, and the nervous and chemical communication systems are intertwined: for instance, release of 
adrenalin by the adrenal gland is subject to control both by nerve impulses and by- chemicals brought io 
the gland by the blood. (6.4) ^ 

The experiments clearly demonstrated that acetylcholine, ^(and related substances) and adrenalin (and 
its relatives) exert opposing actions which maintain a balanced regulation of the transmi^n of nerve 
impulses. (6.3) c 

It is reasonable to ruppose^ that the tranquilizing drugs counteract the inhibitory effect of excesm^ ddrin- 
din or serotonin or some related inhibitor in the human nervous system. (7.3) 

* Senteneefl selected hj means of sUtistieal analjiis as having a degree of significance of 6 and over, 
** Significance factor is given at the end of each Mntence, 



Figure 2.6 Example of an abstract produced by Luinn's system. (Reproduced 
from The IBM Journal of Research and Development). 



46 



ACSI-Matic employed selection procedures anal'ogous to those . ^ 
suggested by Luhn; the importance of this study lies in the, rather ^ 
novel variations imposed on Luhn's basic techniques. These 
modifications included: * ^^^^ **" 

a. elaboration of Luhn's sentence scoring technique 

b. special treatment ^of documents with an unusually large 
^fraction of low-frequency* words , 

c. " special treatment of documents with extraordinarily -long 

sentences 

d. choice of sentences', once scored, to form a tentative abstract 

e. ' procedures to reduce redundancy among the sentences selected^ 
The ACSI-Matic scoring technique improved upon Luhn's treatment 

of the density of representative words in a sentence. To illustrate 
-the technique J ^consider 4he sentence 

NRNNRNNNRNN 

-where N = a nonrepresentative word (N-words) and R =va representative 

word (R -words).. R-words were given a value of 1 and non- terminal 

n ' 
sequences of N-words were given a value of 1/2 , where n is the number 

of N-words between successive R-words. .Thus, the sentence above would 

be scored ' 

l-h 1/4 + 1 + 1/8 + 1 = 3+3/8. 

This procedure was applied .to sentences of documents whose average 

sentence length was- in the range 18 to 26 words. 



\ 
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When a document had an ave^rage sentence length greater* than 26 
words," each sentence score, computed as above, was divided by the 
* square root of tfhe number of words in. the sentence to give a "corrected" 
score.. The procedure fa vored sliglitly selection of longer sentences. 
On the othej hanci, when more than 107o of the sentences of a document 
exceeded 40 -words in length, the unmodified scoring procedure was* 
employed once again. Thus, the overall effect of these procedures was 
to give a slight preference for seTection of sentences whose lengths 
were in the range 2& - 40 words (other things being equal). 

The above scoring techniques were based on the assumption that 
words. whose frequency ^exceeded Jlh e^laverage word frequency within the 

document were "representative". This assumption was made when 48% to 

..... 25 
56% of a document consisted pf function words. * When the percent of 

function words in a document fell outside this range, special treatment 

of the document was effected. When there were more than 56% function 

words in a document, the list of potentially representative words was 

reduced by deleting all words whose frequency of occurrence was greater 

•I 

than 1% of the word- token count for the document. 

If there were less than 48% of function words in a document and 
the document contained more than 357o of unit-frequency words, those * 
words were chosen as representative whose frequency of occurrence was 



Function words are those words not included in one of the classes; 

noun, verb, adjective, adverb. It is -emphasized that there must 
^ be maintained a clear distinction be tweerT -function word and non- 

9. substantive wo^ds . ' - ^ 
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less than or equal to the average word frequency for the document. 

Once all sentences In a document had been scored, a set Of sentences 

which were potentially members of the abstract were selected as follows. 

The number of sentences in the document is divided by ten. 
If the quotient is more than 20, 20 is subtracted from the 
result and the remainder is d^ivided by 32. The number of 
sentences in the abstract is this quotient plus. 20. If the 
document has fewer than 200 sentence^, the abstract has 
10 percent of the total number of sentences, (36) 

The n sentences with highest scores were designated^ as "abstract 

sentences" and the n/4 sentences with next highest scores were called 

"reserve sentences". Word-tokens were consolidated by a process 

analogoifs to Luhn's method following which the "abstract sentences" 

were examined for possible redundancy.. Two sentences were- considered • 

to be redundant if they contained a number of matching words which was 

greater 'than 1/4 the total number of words compared in the two sentences. 

Highly redundant sentences were deleted and were replaced by sentences 

from the "reserve sentence" set.' -This process was continued until no 



more redundant "abstract sentences" were found or until the set of- 

\ 

"reserve sentences" was exhausted. The sentences that were in the set 
of "abstract sentences" at the end of this process constituted the^ 
comptiter-produced abstract. 

The"^AGSI-Matic study thus employed several interesting criteria 
for selecting sentences Jfrom a document to form an abstract.*'- The^ 
computational complexity of the algorithms necessary to perform the 
tasks outlined above is clearly considerable, and the selection 
criteria have not been evaluated. * , 
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2.2.2.3. The Osiyald Study - 

•The essential distinction of the Oswald study (24) is that he 
employed an indexing criterion in the selection of sentences to form 
an abstract. This indexing criterion was 'that groups of words, as well 
as single words', should be index entries. Consequently, Oswald chose 
sentences for an abstract that scored high in the number of "represent- 
ative" word groupings they contained. Representativeness was determined 
in a manner similar to that employed by Luhn (26). 

Oswald's procedure involved the follwing steps (24). 

1. Determine the count pf the tokens of only those words which 
are significant irfi the content of the document. 

2. Next, identify the highest-frequency words and notfi words 
adjacent to them which had a frequency of occurrence greater 
than one. Such juxtapo.sed words formed "multiterms" . 

3. Identify \hose sentences which contain^two^ or more multiterms, 

rank them in decending order of multiterms and select some ^ 

\ 

number of the ^highest 'ranked sentences according to some 
prescribed criterion -for the length of the abstract ♦ 
Since Oswald did not have a computer atT his disposal, he was 
obliged to use human simulation of these procedures for his study. 
And when one considers that the identification of words "significant 
in the co'ntext of the document" (24) entailed a subjective judgement, 
these procedures could riot readily be implemented directly on a computer. 
Nevertheless, the study is significant for three reasons: 
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1. for its recognition of the relationship between an index' 
and an abstract; 

2. for its realization that an abstract represents, at least in 
part, ati attempt to concentrate the essence of a document as 
much a?i possible; 

3. for , its recognition of the fact that 'word-strings of length 
greater than 1 (multitetms) are important in determining 
sentence significance. ^ 

2.2.2.4. Word Association Research 

The significance of word-association studies is that they further 
emphasize the importance of wprd clusters (a fact already emphasized 
above) in conveying an autjhor's intent. Doyle (37, 38, 39)", Bernier 
(40), Quillian (41) and others have attempted to represent a measure of 
semantic information content (i»e., the measure Qf some textual 
structure's value in conveying an author's intent) through the. use of 
special representations of language. • * 

Doyle's association map concept is based upon- statistical' 
association criteria and a map could, therefore, be produced by computer 
program. The method for generating an association map involves the 
creation of an ii x ii correlation matrix (n = number of key-words to 
be correlated), and the correlation of the matrix elements, by means 
of the Pearson correlation coefficient (38), for co-occurrence-\pf key- 
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* ' 26 

words in Individual documents* A portion of a typical association map 

is shown in Figure 2.7. Links between nodes (words) show the word 
associations; Arrows indicate that word co-occurrence is due mainly 
to a two-word term; the arrows point to the second word of the term. 

Doyle envisioned the use of an association map as ah associatively 
organized index which would facilitate the operation of a* combined man- 
machine searching system. But the association map derived from a single 

document represents a telegraphic abstract in which an attempt has- 

» 

been made to indicate term relations through statistical association. 

Quillian's work is superficially similar to that of Dbyje, but in 

the generation of what one might call "semantic maps" Quillian makes 

.the associations intellectualTy rather than statistically! These.* 

semantic map's constitute a memory model'which Quillian lat*er employed 

in studies of human-Like language behaviQr (41)'. The significance of 

this work for abstrac^ting lies jin. the strong emphasis on relations 

between words (concepts) rather than on the'^words alone. Although the 

details of this work cannot be treated here, the reader is referred 
\ 

to. any of several good papers- for this purpose (43, 44, 45). 

f ' ^ ' • 

In addition to the works jcited, Bernier (40) and more recently 

7) h£ 



A.vramescu (46) and Fugmann (4 
specifying boj^h words and rel 



) have emphasized the importance of 
tions between words in indexes and 



26 

In one experiment, Doyle (37) employed the document collection 
* and corresponding index terms (keywords)" already employed in 
another connection:, by Br,-ko (42). 



9 



College =^ Student(sy 

/ ■ -i \ 

/ Counseling Teachfers 
Mental 

v 



\ Guida;ice \ 
J \ . ^Educi 




, Program . / 
^ Child^ren) 



ests / 

I,/ 

Achievement 



Readings 



Ability 



Normal 



lnformatioh.---^V'' 



;Parent(s) 
Psychotherapy 



Therapy ' 



Group . 
intelligence 



Cerebral^^ 
y ^"^^Syrpptoms " ^ 



52 



'Speech 

/ 

/ \ Procedure(s) 



System- 



Per«)nal - 



'Case(s) 

Scale 

Correlation : 

/ ' 

Analysis ''""/"o" 



Response 

J 



Frequency 

\ 



StimulusO) 



.Learning 



Percept(if3'}) 



ion\^ -''Evidence 



/ i 



Reinforcement 



Selection 

V. 

/ 

/ 

Workers ^ 



Visuaf Theory— 



Field 



Organization 



Structure 



Development 



Experimental 



Community 



^ ' " PatJentCs) 
I 
I 

Treatment 



ftychiatric 

/ \ 
/ 

/ • 
.Clinical Research 

/,■ 

V Science 
Status 

\ ' 

\ , 

•Man(men) 



Figure 2,7 Typical association map of Doyle, based on Pearson correlation* 
coefficients* |(Reproduced from Automated Language Processing 
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^^i$cracts> .Nevercheless,; chere has been^no concerted effort to use 
semaacic scrucc'ure as a basis for creating abstracts of original 
4ocun3encs. But since abstracting is fundamentally a matter of semantic 
conjcenc abstraction, it is reasonable to expect that studies along the 
. l;lt>es of the investigations mentioned here would be of value in the 
'cosjpucer-ljased production of abstracts. 

The /TRW/. Study (Edmundson and Wyllys) 
. > The ob^jeccives of the Thompson Ramo Wooldridge, Inc, (TRW) study 
in cosipucer-based^ abstracting were r**ofold: the development, first, 
of aa abs^c-raccing system to produce indicative abstracts, and second, 
.q£ a research methodology which would permit new text and new abstract- 
Ing'crlterid CO be Siandled -efficiently f25, 48, 49, 50, 51, 52, 53, 54). 
Th^ research methodology comprised a study of the abstracting behavior 
of hunjans., a general forraulatior^of th^^bstracting problem and its 



eiacioa to rhe problep^^of e^^uatlbn, a mathematical and logical study 
af che problem of evaluation, a mathematical and logical study. of the 



ptoblefa of assxgning numerical weights to sentences, and a set of 

bscradting exoeriments employing a cycle of implementation, testing 
and itnprovement. The research concerned with the abstracting behavior 



of humans fias already been discussed in section .2. 1;2. 

The eva^iuatipn of the quality of abstracts produced by any system 
Is necessary in order t.o make improvements in that system. The TRW 
group considered' five methods of evaluation (25). * • 

"1. Intuitive value judgement 
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2. Creation of "ideal" abstracts to serve as standards o'f^- — - 
■ comparison 

3. Construction of college-t^pe test questions on the document 
to be answered from abstracts by a sample population 
(evaluation of the summary function of an abstract) 

* 4. Retrievability of the document via the abstract (evaluation 
of the retrieval function of an abstract) 

5. Statistical correlations (applicable only^ to extracts) 
Metho'd 2 was implemented early in the research- with the^results giving 
an indication of how "human-like" the selection of senteiices is. The 
method was not considered a final evaluation of ^abstracts , but only a 
rough indication of agreement with human selection of sentences. This 
method served primarily as a rejection test for abstracts which were 
little better than random selection of sentences. Evaluation methods 
4 and 5 were implemented later in the research, to indicate areas of 
content improvement. ^ 

The TRW study developed a logical, mathematical method for the 
assignment of numerical weights to sentences. This* study showed that 
tKere is considerable potential in a set of four methods of sentence 
selection which tiiey called the Cue, Key, Title, and Location methods. 
These methods can be describ,3d, bri^^fly, as follows (54). 

The Cue method makes* use of a list of words which are classified * - 
.IS, bc.t^us words (those that have a positive value, or. "height, in 
sentence selection), stigma words (words that have a negative weight) 
and null words (those words which are irrelevant to sentence selection). 

The Key method is based on the frequency of occurrence of words, 'and 
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is similar in approach to the method used by Luhn in his pioneering study. 

The Title method is baffed on a glossary of the words of the title 
and subtitles (excluding null words)* Sentences containing words that 
also occ^ur in the &tle are, assigned a higher weight'either than 
sentences which contain words that occur also in a subtitle, or sentences 
which have no such wjords (other things being equal). 

The Location method is based on thlp^hypothesis that certain head- *^ ' 
ings precede important passages and that topic sentences occur early 
or late in a document or paragraph. This latter was also Baxendale's 
' hypothesis, resulting from observations made in her studies of automatic 
indexing (27). These four methods are summarized in Figure 2-8. 

In the final system the relative weights among the four basic 
methods were parameterized in terms of the linear function 

"aC+aK+aT+aL 
12 3 4 

where a , a , a , and a are the parameters (positive integers-) for the 
12 3 4 

Cue, Key, Titles and Location weights, respectively. The mean percent- 
ages of coselection of sentences from the ideal abstract and the test 
document for the mosL interesting methods are shown in Figure 2.9, 
with the intervals encompassing the sample mean plus and minus one 
sample standard deviation. The -Cue-Title-Location method is seen to 
have the highest mean coselection score, while uhe Key method in 
isolation is the poorer of the methods. On the basis of these data it 
was decided to omit the Key method as a component in the preferred 
abstracting process. An example of an abstract produced by use of the 
combination of the Cue, Title, and Location methods is shown in 
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Structural Sources of Clues: 






Body of Docuznent ^ 
(Text) 


Skeleton of Document 
(Title, Headings, Format) 


LlG^lStlC 

Sources 
Clues: 


General 

Characteristics 
of corpus 


CUB MBTHOD: 

Cue Dictionary 
(995 words) 

(Includes Bonus ^ 
Stigma^ and Null 
subdictionaries )^ 


LOCATION METHOD: 

Heading Dictionary 
(90 words) 

(Location method also uses 
ordinal weights) 




Specific 
Characteristics 
of Docunent 


KEf MBraOD: 
Key Glossary 


TITLE MmOD: 

Title Glossary 



Figure 2.8 RatiQirale of the four basic sentence selection methods 
employed by Edmundson. (Reproduced from The Journal of 
the Association for Computing Machinery (54) ♦ ) 
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Figure 2.§ Mean coseiection scores obtained from the sentence selection 
methods employed by Edmundson. (Reproduced from The Journal 
of the Association for Computing Machinery (54) .) 
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I \ 

Figure 2.10. \ '^^ 

The- results of the research at TRW indicated that abstracts can 
be defined, identified and produced in a computer-based system. They 
concluded that future abstracting methods must take into account 
syntactic and semantic characteristics of the language and text; they' 
could not have relied simply upon gross statistical evidence. Edmundson 
concluded that the major task of any further research would be to 
identify and define the differences between manual and.,computer-produced 
abstracts and to minimize these differences so that computer-produced 
abstracts can supplement, and perhaps compete with, traditional ones 
(54). 

2.2.2.6. The Earl Study 

The investigation of computer-based informative abstracting and 
extracting performed by Earl- and her associates at Lockheed Missiles 
and Space Company has been aimed at basic research in English morphology, 
phonetics, and syntax (55, .56, 57, 58, 59, 60, 61, 62, 63). This study, 
which has been supported by the Office of Naval Research since 1964, 
-has dealt with basic linguistic research as a necessary prerequisite ^ 
to the' production of abstracts by computer. 

During the first three years of the program, a word-data base was 
.established. This data base along with a part-cf -speech algorithm 
was used' to provide an algorithmic determination of the parts of speech 
of written English words. The parts of .vpeech were later used to 
determine if there existed any linguistic similarity between sentences 
that were used for abstracts. Also, during these years, it was 
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AISTMCT lASiO OK CUI TITLI IOC. mTS. 
IMiUATIW Of THi IMICT Of OINCTHVUMINI MRINC AND SIVIRAL OTNIR-AOOITIVir 0N)C0M«USTI0N STAIIIITV CHAMCTU ISTICS 
Of VMIOUt NVDftOCARlON TVM fUlLS IN fMlLllfS MlCROIUftMCR lAOITTSOl 
«• I. MACI 



SUMMARY 



AT TNlalOUIST Of TNI NAVY MMIAU Of AlROIlAUTICS* fNILLIfS flTRCRIUM COMfANV U((ifilRTOOl^ THI 
CVALUATION Of DIMITNVLAMIMI tORiHe AS AN AOOITIVI fOR iNf ROylN^TMf COMtUSTION tNARACTlRl ST ICS Of 
AVIATION CAS TURIINf TVfl fUfLi*- ' • 

SICAUSI Of THf SMALL AMOUNT llOO.fiRANSl Of OINCTHVLAMiNt IORINf^'«fC«(yf^XUUUUL£lti:iRV*CHCi4ICAL- 
COMfANV* TNIS IVAIUATION HAS IIIN LiNlTIO TO THi MIASURIMItNT Of ITS IfflCT ON THl~~fUXPr*V«t 
CHAKACTCRISTICS Of THRU fURl MYOROCARICNS ITOIUCNI* NORMAL HffTANI ANO IINIINCl IN THI fHllLlfS 
NICROMJRNIR. 

HIVtOUS STUOICS IN fHlLLIfS I INCH TURI04IT INCtNC.TVfl COMIUSTOR HAO INOICATIO THAT SUCH MATIRIALS 
COULD SUftSTANTlALLV INCRIASI THI MAXIMUM RATI Of HfAT RILIASK ATTAINAILI* liflCIAL!.V MITH LOM 
fIRfORHANCI fUtLS SUCM AS THI ISO fARAfflN TVfC HVOftOCMftONSof ARTICULARLV MHIN OflRATINC UNOCR 
SIVCRI CONOITIONS fOR COMMSTIC^ (UC* ..HICH A|R fLOW VCLOCITV OR LOW CONRUSTION fRISSURII. 

THi ASSUMfTION HAS IIIN MAQl IN THIS fUlL IVALUATION THAT THC 6RIATIR THI ALLOtfAILI HIAT INfUT RATI 
fOR A CIViN mXITV* THI 6RIATIR THC DICRII Of COMiUSTION STAIILITV. 

CM T(!IS lASISff THC OATA INOICATC THAT ALL TNI AOOlTlVC MATIRIALS TCSTIO CAUSIO AN IMCRiASI IN 
STAIILtFV. fCRfOHMNCI.ff A fUCL Of RIUTIVILV LOM flRfORMANCI SUCH AS TOLUINI UINC UNCflTIO TO A 
UtATiR CXTCtlT TMN A HiCH fdRfOSIIANCI fUCL SUCH AS NORMAL HCfTANI* ^ 

IM UNCRAL* AOOtTIVI CONCINTRATIONS Of ONC f|R CCNT RV «CICHT IH IKl SIVIUL fURI HVOROCARIONS MHICH 
MMNALLV OlfflRCD.OUITI MIDILV III flRfOHNANCI* fROOUCID UMlfORHLy SUfCRIOR<COM»UST|ON STAIILITV 
^MARACTIftlSTICS AS MlASURID USIN9 THC fHlLLIfS MiCROtURNCR. 

I. INTRODUCTION 

AT THI RIOUCST Of THI NAVY tURIAU Of URtMAUTICS THC JIT fU0LS CROUf HAS IVALUATIO W ffflCfS Of 
THC ADOITION Of SMALL AMOUNTS Of DINCTMVLANINCRDRIN -S ON TNC COMIUSTION STAIILITV flRfORMANCC Of 
SIVIRAL HVORflCMMN fUlLS. 

OUS TO THC SMALL QUANTITY Of THIS MTIRIAL CSTAINCO THI CVALUATION MAS CQNOUCTIO IN THC fHlLLtfS 
MtCAOMMNCR IMMi Ul MHICH IS A StIIHTLV MOOlflCD VCrSION Of TNi ORIIINAL'fHILLIfS MlUOMRNCR 
tMOML It. ^ ' 

II. OCKRIfTION Of ntlLLIfS NIUOMMMiR (NOOCL lAI 

IIU DCfCllfTlON Of TIST AffARATUS i 
IV* OfSCftlfTlON Of ICS? fUCLS 
V* TCST fROCCOURC 

VI. RESULTS 

VII. DISCUSSION 

fRCVIOw'S MORK CONOUCTCO IN THC fHILLIfS I INCH CONCUSTOR (Rif; it INOlCATCD THAT iJHC AOOITIVCS 
CAUSCO A SlC::!flaNT INCRCASC IN TNC fCRfORMANCC Of A LOM RATING fUCL MH(L€ THCSC 4ANC AOOtTIVCS 0(0 
NOT SmSTANTIAiLY S^fCCT THC HllNCR RATINC fUCLS. 

ALL fOUR AOOITIVCS INOICATlb TMCIR AOOITIQN TO IC tUMCCT TO THC CffCCT Of OCMINlfHlNC RCSULTS UfON 
fURTHCR ADOITION-THAT IS* THCIR CirPSCr MAS NOT CSSCNTtALLV A UCNOtNC CffCCT* 

VIII. CONCLUSIONS ^ . * * 

1. THC ADDITION Of DINCTHVLAMINC lORINC IN CONCCNTRATlONi 9f ONC fCR CCNT IV MCI6HT TO JCT fUCL TVf^l 
HVOROCARIONS' RCSULTCD, IN A UNIfORMLV HIOH LCVCt Of COMIUSTION nAIILlTV fCRfORMANCC AS NCASURCO DV ^ 
fHILLIfS MICROIURNCR. 

j2. THE ADDITIOtI Of RCLATtVCLV LAR6C AMOUNTS Of fROfVLCNC OXIOC TO TOLUCNC MCRC <.*i§CCSSARV TO fROVIDC 
SICNIflCANT iMfaOVCMENT IN STAIILITV fCRfORMANCC AS INOlCATCD IV INCRCASCS XH AlLOMAClC HCAT IN9UT 
'RATCS. 
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). THC ADDITION Of AOOITIVC CONCCNTRATIDNS lUf TO 1 MR CCNTl Of AMVL Nl TRATC*/ CUMCNC HVbROKROXIOCt 
ANO DINCTHVLAMINE lORtNC ALL RCSULTCD |M iMfROVCO STAIILITV f CRf ORMANCC* » THC /CRCATCST -INCRCASCS 
MCRE SNOMM HHCN ILENDCD MiTH A fUCL Of fOOR fCRfORMANCC CHMACTCRlSTICS»SUCfr AS TOLUCNC. 

IX. RECOMMCHOATIONS ^ 4 

lASCD ON THC CVALUATION Of THC CffKTS Of AOOITIVCS ON THC fLASMRACK LIMITS Of THC AOOITIVC^f UCL 
ILCNOS TCSTCO IN THC MICRDIUXNCR IMOOCL Ul IT IS RCCOMMCNOCO THAT DINCTHVLAMINC lORtNC SHOULD IC 
fURTHCR iNVESTtCATCO* 

THIS fUTURC MORK SHOULD INCLUOC STUOV Of COMIUSTION STAIILITV ANO COMIUSTION Cff ICICNCV CffCCTS IN 
THC fHILLIfS I INCH CONtUSTOR ANO AN INVCSTI6ATI0N Of ITS INfLUCNCC ON CDNCUSTION CLCANLINCSS. 



Figure 2.10 Example of an abstract produced by Edmundson's system. 

(Reproduced from The Journal of the Association for 
Computing Machinery (54). ) 
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demonstrated how an English/Russian phrase data base can be used to 
develop a technique for obtaining English indexes from untranslated 
Russian text. 

'==«««^nil4)ui^iing*-the-^ and fourth years of the project, experiments in 

the compilation. of a "sentence* dictionary" of syntactic types began 
and compilation of Englis^h syntactic word government tables was under- 
taken. .Toe hypothesis which the sentence 'dictionary was used to 
support, States that if a large gr.oup of sentences, as representative 
of the language as possible, are processed, classified as "indexable" 
or "nonindexable ," and assigned a syntactic structure, then when these 
structures are sorted, it will be found that like structures have like 
index classifications. The structures can be ordered into a "dictionary 
of sentence types, each classified as indexable or non-indexable. When 
sentences from a docunvent which is to be abstracted are matched against 
the sentence dictionary, those sentences which- are indexable would be 
candidates for inclusion in the index or in *the abstract. Table 2.3 
shows the results of experiments designed to test this hypothesis. 

Based, on these data, it seemed clear tl.at representing a sentience 
by part-of -speech strings made too fine a distinction between sentences. 
The sentences were then structured into phrases, to cause sentences of 
like phrase structure to be grquped. The phrase structure approach to 
syntactic patterns gave impetus to development of English syntactic 
word government, tables . These word government tables contained entries 
which re'flected the fact that a word's government pattern is often 
linked with its semantic meaning, that is, syntactic pattern is' a clue 
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Table 2^3 Statistics of part-of -speech patterns in text (from (59)) 



Item Number of Chapters *in Data Base 



\ 



3 6 8 9 



(1) Number "of total patterns ^ '"18 25 31 34 
represented by more than 

oneTsentence 

(2) Numbe^r of total patterns 14 15 21 23 
represented by more than^y.jf / 

one sentence, .'.^wiIcfczar:con'^^-i^;;^^ 
sis tent index code 

(3) Number of total duplicated ' 3 6 8 12 
patterns common to more " . 

than one article 

(4) Number of total duplicated 2 3 4^5 
patterns common to more , . 

thaiT one article, with a 

consistent index code • 

(5) Number of one-of-a-kind - 1198"'' 2425 2822 - 3064 
patterns 

(6) Number of total unique 1216 2450 2853 3098 
patterns 

(7) Ratio of the number of . 0.9.85 0,989 0,989 0.992 
one-bf-a-kind patterns ^ • 

to number of total unique 

patterns - * 
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to semantic meaning. Experiments were devised to test the applicability 
of the phrase to computer-based abstracting. The da^a obtained from 
these experiments is sfiown in Table 2.4 (61). From these experiments 
it was concluded that both the part-of -speech and the .phrase-pattern 
methods of syntactic classification are inadequate to -separate index- 
able from non-indexable sentences. The experimental results shown in*' 
Table 2 .4 ^indicate that there are far too many -unique patterns and 
that the consistency of index codes tends- to decrease with the number* 
of unique patterns. Based on these results, it was concluded that 
indexable and non-indexable sentences cannot be distinguished by 
structure alone. 

During the fifth year of the project. Earl and associates developed 
a pairsj-ng program, initiated some extracting experiments on l:echnical 
text, and experimented with automatic indexing of a medical book. In 
the sixth year^ the sentence dictionary experiment was concluded, the 
extracting experiment was completed, a* frequency-syntax method of, 
indexing was conceived and teS^ted. and the concept of English syntactic 
.wprd government was expanded while compilation of the tables continued. 
During the seventh /year , the scope of the parsing program was extended, 
preparatory to additional indexing experiments using syntax in 
conjunction with frequency counts or word government criteria. Also, 
during this period, some studies in describing and abstracting pictorial 
structures were undertaken. A critical review of the field was 
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Table 2.4 Comparison of part-of -speech and phrase patterns (from (61)) 



Item Part-of -Speech , Phrase 

Patterns Patterns 



(1) Number of totaL patterns represented 

by more than one sentence^ , 34 35 

•» • ^ ' 

(2) Number of total patterns represented 
by more than one sentence with a 

consistent index code ^ 23 ^ ' 15 

(3) Number' of total duplicated patterns 

common to more than one article " 12 ^ 26 

(4) Number of total duplicated patterns 
common to more than one article, 

with a consistent index code 5 11 

(5) Number o^' one-of-a-kind patterns, 3064 ^ . 3026 

ib^ Number o^ total unique patterns " 3098 3061 

I 

("/, Ratio of the numbe'r of one-of-a- " 
kind patterns to number of total 

unique patterns 0.992 0.988 
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prepared and a series of experiments using human subjects to describe 
aerial terrain p.hot'ographs was conducted.. 

VJhile some interesting experimental data ,have been obtained by 
Earl, et al . , it appears ,that they ar§ no closer to a system for computer 
based abstracting than when the study began. 
2.2.2 J . The Soviet Study . , " 

E. F. Skoroxod'ko is carrying on research in automatic abstracting 
at the Academy of Sciences of the Ukranian SSR in Kiev, USSR (64^ 65). 
His research is based on the assumption that only a method of automatic 
abstracting which adapts itself to a text can provide good and stable 
results. This assumption is based on the belief that a given text has*** 
individual characteristics and that the optimal selection criteria 
and abstracting procedure should be determined based on those character- 
-istics. ^ I 

The individual characteristics of a given text nro defined by the 

r 

form of a semantic network which represents the text* A .semantic 
network is defined to be a graph where the nodes are associated with 

0 

sentences and the arcs are associated with semantic relations between 
sentences. Two sentences, A and B, are said to be semantically reflated 
if I) at least one noun occurs in both sentences k and B, or 2) sentence 
A contains a word "a" and sentence B contains a word **b" where "a" 
and "b",have been predefined, as being semantically r^ilated, or 3) when 
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To be published in. 1973. 
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words* "a" and *'b" in- sentences A and B are related with respect to a 
^ given text. The graphs ol the semantic structures can be identified 
in terms of four important -types which are .shown i. Figure 2.11.. The 
selection of an appro|priate abstracting procedui:e is based on the type 
of semantic structure. • * ^ 

The significance of each sentence is assumed to be directly 
p-roportional to\the nuCiber of sentences which are semantically related 
to it. Thus, nodes in the graph which have the most' incident arcs are 
defined to be the most significant. The sentence significance also 
depends upon the amount of change in the semantic network for that 
document when the node for that sentence is removed from the network. ^ 
The general significance of a sentence is determined using the following 
formula: 

F. = N. (M - M ) , 
111 



where 



is the functional weight of a sentence 



in' text 



M 



is the number of arcs incident to a node associated with 
a given sentence (i.e., the number of s mtences semantical- 
ly related to a given sentence') V. 
is the total number of nodes in a sentence netvjork (i-^'. > 
the number of sentences in a text) 

is the maximum number of nodes in any connected component 
of a network after removal of a node associated with a 
given -sentence (i^.e . , the number of sentences in the 
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longest connacfed ^^gment of text fornred after^^the removal 
of .given 3f*nVence) . 
Sased on this measure^ Skorpxod'ko concludes that In the case of 
chained or ringed scructpre networks, it is impossibly to ^orra an 
adequate extr.act from sentenxes raken from the text s-ince all sentences 
have approximately the same semantic value. Auf^mat^c abstracts can 
only be generated from texts where seraa'ntic relationships can be 
depicted either as monolithic cr piccevise structures. 
' - S5^oroxod*ko pres^n^ only one of the procedures developed for the 
adaptable process (6^*^r There ar^'^th^ following seven operations with 
oper^rcloni? 6 and 7 being optional: 

1. The determinarion of functional weights of all sentences in. 
a text. 



The ^coajpression of a te^<t,-. i^,£, > the removal of sentences 

which have functional weighjc-s considerably less than the . 

average funccio^nal weight of sentences throughout the text. - 

Such sentences/ are/ genet filly examples, explanatipns, etc. 

3> • rne sel-taentacion of a cpxr, irt-s-divi3ion into segments 

are/ relatively* automonous in semantic and informational 

aspects* A section begins with a sentence whose linear 

, coefficient is l«ss than h definite critical, value, 
^ \ ^ \ 

U^ The selection ot one or trorc sentences with max^imum functional 



> • rne ses 



\ 



w'elg'ht in eacb ^egsi^nt'. A set of such sentences forms an 
abstract o£ tha text (ej^crac^). 



The decertRinattoirs of JunctriojcuJOL wei^^'^^cs of vords in ^n abs?:racc. 



■ -\ 



^ 68 

6. The removal of words witly mininal func1:ional weights from an 
abstract. ^ ^ ^ 

7. The translation of an abstract into an informational retrieval 
language (if necessary for inforniation retrieval). 

The approach taken by Skoroxod'ko relies heavily on the co-occurrence 
of words in ^he text and the matching of words to synonym definitions 
in the dictionary. The quality of the abstracts vjould then appear to 
be greatly dependent on the construction of the dictionary, which is 
manually produced. Skoroxod'ko' s publication (64) does not include any 
sample abstracts produced by this system nor any mention of evaluation 
procedures. Thus, although the theory appears to be well developed, it 
is not possible to ascertain the practical- effectiveness of^thii, 
method - 
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CHAPTER III. DESIGN OE THE ABSTRACTING SYSTEM 



The automatic abstracting system which has been used as the basis^ 
for. this research has been describeS. in detail elsewhere "^(l'',-2, 3). 
This chapter provides a brief , description of the system as it was 
originally developed by J. E. Rush, R. Salvador and A. Zamora. Tha 
system has-been named ADAM, for Automatic Document^ Abstracting Method. 

1. Philosophy Underlying the Abstracting System 

' '.>'.- ■ ■ ' ' ' 

The computer-^ased abstr'acting system which has beon .developed - 
consists of two important comr^onents, a dictionary, vulle4 the Word 

L ■ • ■ 

Control List (WCL) , and a set or rules for implementing certain 
functions specified for each WCL entry. This combination of ^rules aud 
dictionary has been designed and^ implemented to accomplish the product 
ion of abstracts v;hich are characterized as follows. 

a*. Their size is approximately 10% of the original document, 

i 

and the use of arbitrary cut-aff criteria -is avoided. 

b. They use;^ the same 'tedhnical terminology as in the document. 

} 

c. Except jEor actual results, they contain no numbers or 

cardina^l expressions. ^ \ \ > 

i ■ . . ; 1 

d. Unconventional or rare characters .or aboreVdations are 

"I - ' ^ 

excluded. 

.e. Preliminary remarks , equa.tions, footnotes, references, 

quotat/ons , tables , charts', figures , graphs , descriptive 
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cataloging, data and the like are not included. 

f. Negative results ^ unless they are the sole r^-ults, are 
excluded . . > «. 

g. They do not contain methodologies of data gathering, 
measurements, preparation pf samples, etc., unle&s thj 
are the purpose of the- work. . . , 

h. No examples, explanations, speculative statements, opinions 

" ' ^ > / , i 

or comparisons are included. ^ - - 

These. are mainly things fo be excluded from the abstract.^ On the 
positive side, it is desirable that the -abstrac.f include: 

i. Objectives of the work: 

j. Methods used in the work (if .they are the main purpose of 
^ the investigation). 

k. Results and conclusions. 
To automatically produce abstracts with these characteristics-, it is 
necessary to identify and elimilnate certain sentences of -the document. 
It is also necessary to identify and sefect a .few sentences for the 
abstract, and to retain, by default, certain- sentences .for the abstract 
These three nja^thq^s of sentence handling are discussed in Section^ 1.3. 

For efricieXy, a language processing program should consider 
the largest inde )endent ^item in its data base as its basic^'unit . ^ Thus, 
in automatic abstracting the basic unit is an original article. Any^ 
approach which considers paragraphs or sentences as bas:.c units is 
inadequate because there is an in.te^-depcindence between these ^elements 
aad' the, remainder o£ an article. /A program which ope:rates on .inter- 
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dependent=^,units bears the burden of carrying data from one unit to the 

^--^ 

next. An automatic language processing, program must also be able to 
identify and manipulate the linguistic elements of its basic data unit 
whether^ these elements be words, phrases, clauses or sentences, 

2. Data Input o ^ ^ ^ . "^^^ 

Data input involves two basic processes:, the pre-editing of the 
original document and preparation of the data in a format acceptable 
to the abstracting system. These processes^ should involve* liu tie; or 
no human intervention. t 
" Ere-editing means pre-processing editing. The .text to be 

abstra^cted is edited in feome way prior to its being processed by the 

\ '\ 

"abstracting programs. In the v^ork of Edmundson, et al. (4), pre- 
editing involved the insertion, of special markers (flajts^) int'o the 

■ ■ ^ ' " 'I 

input text to delimit sentences, paragraphs, "section headings, etc. 
This type of pre-.editing has been considered necessary because of the 
ambiguous usage of periods, commas, etc):' ADAM eschews this type of 
pre-editing altogether in favor of automatic reco^gnition of phrases, | 
clauses, sentences and dther elements of the original document-. On / 



.the other hand, when riaanual data input y's employed, the keyborad 
operator is provided with instructions ^o omit figures, tables and'";|, 

- I '/ ! 

other similar^^gi^aphic material which wfl-1 not, I in any event, find its 

• \ 1 - ' ■ /:/ ! ■- 

way into tl}e abs;tr^cfc. Thus ADAM empl|'oys minimal pre-editing of the 

'/ 

input cfext i 
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Similarly, ADAM requires no special formatting of the input text. 
The system accepts the text as a continuous string and performs all 
partitioning (into words, sentences, etc.')^ of the input text auto- 
matically. It can be made to^a^cept the data in any particular code 
(ASCII,' BCD, EBCDIC, etc.) with only minor modification of the system. 
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3 . Methods of Sentence Selection and Rejection | 

.In most research on computer-based abstracting, it has been 
considered necessary to analyze the conditions under which , various 
methods of sentence selection are successful, in order to develop 
criteria for selecting sentences to form an abstract (5). But clearly 
an abstract can also be produced by rejecting /sentences of the original 
document which are irrelevant to the abstract;. RusK, Salvador and 
Zamora found that methods for rejecting sentences were found to" be 
more .fruitful than selection methods in mos^ cases (1). It is upon 



this Idea that ADAM is built. In the following sections, methods of 
sentence selection and rejection are discussed, including contextual 
inference, intersentence reference, /irequency criteria, and coherence 
xconsiderations. ' 



3.1. Sentence Elimination 



The exclusion of sentences fr,om the' abstract involVes the 

■ ■ . \ ' ' : 

detection of words or strings of words which identify sentences giving 
hfistorical data, results of previous work, examples, explanation, 
speculative material and so on. Analysis of documents from a aumber 
of scientific disciplines showed that the; set of- word strings needed 
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to accomplish this task need not be large, perhaps a few hundred word- 
strings serving to eliminate up to ,90% of^ the sentences of a document. 
Such wprd strings are incorpoirated in a dictionary called- the Word 
Control List. Sentence elimination is not, however j carried out blindly. 
An important aspect of sentence elimination is the location with'in the 
sentence of the ".offending" word or phrase. This aspcet is considered 
under "location, criteria . " 



3.2. Lopation Criteria 

\ , ^ ' ' ' 

Location criteria for sentence rejection (or se.lection) are based 

on the pliysical arrangement of the linguistic elemejits of an article. 

This arir^Yigement can be describe'd.^n terms of the location ofJa 

5 

sentence^" With respect to the limits of its containing doeunient,. or in 
terms of the location of phrases, clauses, or words with respect to the 
limits of a sentence. 

The first of these arrangements (sentenr;^ locatipn) is governed by 
the style of the author or the editor, with general writing guides 
providing, advice about the placement of sentences within an article. 
Sinct it i.'i not possible to dictate in-the^ma tte.ti. ol_sty/le , the 



location of a sentence does not convey an unequivocal criterion for. 

/ ' 

sentence select,ion or rejection, 

The second location typ^ is really a sentence description. Tae 
location'^of ^phrases and words^ within the sentence is subject to 
drarmatical rules to which authors and editors adhere. Even a* partial 
janalysls of a sentence yields its basic structure, since the number 
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of basic sentence frameworks is small. 

Punctuation marks are important -data for use with location- 
criteria, k sentence terminated^ by a question mark can be rejected 
. out of hand, and sentences located in' close physical proximity to it 
can also usually be rejected becau.se they are related direct i,y to the 
question. Commao are used to|deliudt clauses and lists (series of 
items), and to separate digits In a number. Periods not only delimit 
sentences, they also appear in numb<^rs, ellipses and abbreviationsTv 



These usages cf punctuati^orx must be 
error. The way in which location^ 
snadft clearer below. 



differentiated in order to avoid 



iteria are employed in ADAK will be 



3.3. T he "Cue"; Method of Sentence Rejection and Selection 

^ 1 L 

ADAM uses what ^lEdmundson (6) called the "Cue" method almost j, 
exclusively.. Cue words are words or strings of words which ^re, ^inl 
genera)., unequivocal clues to such things as opinion and subjectivity, 
as well as to some positive notions, in this system. Cue words are- 
contained in a iictionary called the^ord ControT Li^st (WCL) (se^ 
Section 4), together with codes which indicate their function within "a 
sentence, or within a particular location in the sentence. 

: " 

\ The Cue method provides a powerful approach to sentence seLection 

- * . — • / ^ 

or rejection. The method depends jovi fhe fact that it is possible to 
decide what should or should not be included in an -abstract^ based 



upon the presence m "rlTB^rircAna^^r.ri cJL e^of parfi cular words or 



cctiibinations of "words . Fc • examp-ie, words which*are known to te used 



\ 
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in sentences which state the ^purpose of a paper serve to indicate that 
such sentences should be selected for the abstract. "Our \vork", 
"this paper", and "present research" are expressions which meet- this 
criterion, but so does "this theme paper". It is necessary, then, to 
effect partial matches between WCL entries and words of a sentence to 
allow for varied input while maintaining a manageable WCL. A partial 
match occurs when one or more words intervene between any two words of 
a cue expression* 

Opinions, references to figures, and other items which should 
not be included in an abstract can be identified by cue words such as 
"obvious", "believe", "Fig.", "Figure l",^"Tabie IV", etc. The weight 
of a cue word may also depend on its position in a sentence* A sentence 
starting with "A" or "Some" is more likely to present detailed 
descriptions than a sentence which contains either of th.ese words in a 
more central location of the sentence* This is because these words 
have a strong quantitative, function when they appear at the beginning 
of a sentence. Similarly, sentences which begin with participles are 
usually conditional in nature, indicating assumptions or conjectures* 

Thus, the Cue method is based upon the identification witKin a 
sentence, ~aW also Vithin a particular location in the .sentence, of 
words or word strings fpund in a dictionary.. It is important, there- 
fore, that the dictionary be kept small and constant. Neither a large 
dictionary nor a rapidly changing one will permit the development of a 
viable, operational automatic abstracting system. These considerations 
weighed heavilv in fayor of the decision to employ rejection criteria 
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almost exclusively. Cue words which indicate that a sentence contains 
nothing of import ^are few in number, and of stable and unambiguous 
usage. By contrast, the cue words which indicate important notions are 
many in number and are of variable and ambiguous usage. By using the 

rejection approach to abstract production, the Word Control List has 

♦ * «. 

been made to contain fewer than 700 entries. 

3.4. Intersentence Reference 

Intersentence references give much Information about the logical - 
relationships within the text material, but they require involved 
treatment if a coherent abstract is to be produced. If more than one 
clause exists in a sentence, then the first clause is indispensable 
to the meaning of the sentence. The first clause will usually contain 
intersentence references if there are any. Words in the second and 
subsequent clauses which require antecedents usually refer to the 
rirst clause. Some cue words that indicate intersentence references 
are "these", "they", "it*^, and "above". * When these words have multiple 
uses, additional criteria are required to determine if there is inter- 
sentence reference. ' ' 

A"specrab^case of- intersentence- reference ^is- -that-between- the 

title and the sentences of a document. The Title method has as a 
premise that the author (generally) describes in as few words as 
possible the essence of his paper; it can be assumed, then, tfiat the 
words of the title are well chosen and of high significance. 
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In ADAM, the words of each sentence are matched against, the 
potehtial-information-carrying. words of the t^tle; if any -words match, 
.the. sentence becomes a candidate for selection. .However, the 
possibility that the substantive words of the title may appear frequent 
ly, means that additional .criteria should be used before any sentence 
containing words which also occur in the title is accepted for inclus- 
ion in the abstract. ' ' . . ' 

3.5. F requency Criteria 

A simple yet effective way of introducing frequency criteria is 
as follows: if any cue expression exceeds a given frequency threshold, 
then its value should be reduced. Thif; means that if the cue 
expression has a positive ^-^ight it should become less positive, and 
if it has a negative weight it should become less negative. With these 
guidelines it should be possible to successfully produce abstracts of 

papers in which cue wortls are used in unusual ways.. The thresholds at 

\ 

which the weight transitions should take place neea to be determined, 

. . • ^ ' \ 

but Sjtatistical data are needed only for the cue expressions contained 

in the dictionary. In ADAM a module 'has ^been incorporated, vhich is 

optionally executable, which decreases the strength of bor^ negatively 

and postitively weighted WCL entries (^.e. , makes the entryXless * 

influential) when the WCL entry is found more frequently thanXjesired 

in the text. \ 

\ 
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3.6. Coherence Considerations 

Regardless of the portion of the original document that finds its 
way into the abstract, that abstract should be* as coherent^ as possible 
both logically and linguistically. Thus the progression of ideas 
presented in the abstract should flow smoothly and each sentence of 
the abstract should 'be linguistically well-formed. This latter 
criterion requires some analysis of the sentences selected for the 
abstract before the final set can be determined. At present, ADAM , 
merely, checks .each sentence for the presence of, 5 verb and rejects 
that "sentence" if no verb is found.*; 

4. The Word Control List (WCL) ^ 

The WCL consists of an alphabeticalj.y ordered set of words and 
phrases, which. are referred to collectively as word strings, and one 
or two associated erodes. The entries in the WCL are .treated as 
. functions and each has two arguments: a semantic weight and. a syntactic 
value. Each function returns a value which indicates whether the v 
sentence is a candidate for retention or deletion.- In general, a WCL 
entry is represented as . 

WORD STRING*(Csemantic weightj^Csyntactic value]) 

where WORD STRING is> a, string of alphanumeric characters and blanks, 
the **s are delimeters and semantic weight and syntactic value are one 
character fields. The parentheses and brackets in the above expression 
are used to indicate that the presence of the enclosed items are 
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required or. optional, respectively; they dp not appear in the WCL. As 
will be seen later, entries in the WCL may be varied as desired with- 
out necessitating any change in the programs 6£ the system. 

4.1. Hierarchical Rules for Implementing the Functions in the WCL 

When one attempts to determine whether a sentence of the document 
is a member of the abstract or not, using a two-valued membership 
criterion, it is often, found that a sentence is both an element, and 
yet net an element, of the set of sentences constituting the abstract 
because it contains word strings of both positive and negative semantic 
weight. Two alternate solutions of this predicament are available: 
1) impose *an ordering on the several semantic weights, or 2) determine 
a degree of membership of the sentence in. the abstract (7). The' 
former alternative has been chosen for implementation, but it would 
be interesting to compare abstracts obtained using each of the methods. 
The effect of this choice is to both simplify and speed up processing. 

The hierarchy of rules for implementing the semantic weights 
from the WCL is shown in Table 3.1. Considerable flexibility is 
obtained by incorporating the rules in the program and supplying the 
semantic weights externall/. The rules can be altered without- the 
necessity for changing the WCL, and the WCL can be altered independent- 
ly of the rules . - • 

\ 

\^ 2 . Syntactic Values and Their Use 

\ Since the implementation of the semantic weights discussed in^the 
\ ' ' * 

previous section requires some syntactic information, a partial 
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Table 3.1. ^Semantic attributes for WCL entries 



a 

Semantic Code * Description 

I , Used for very positive terms; those which almost 

unequivocally indicate something of importance , 
(£•£' V our work) 

A Assigned to very negative terms; terms which do ^ 

not belong in an abstract (e^.g.. , obvious, 
previously)^ 

' K Ass,igned to terms which are related to items of 

positive data content (e^.g^. , important) 

B Parenthetical expressions, terms of low data 

cpntent,*or terms which are associated with 
Items of low data content (^.g.. , however) 

E Used for intensifiers and determiners (e,.S_. , 

many, more) - ' - ^ . - 

< 

L ^ Introductory qualifiers (e^. g_,\ once, a) 

C Used for words which require ati antecedent* (e_,g_. , 

this , these) 

* • 

H ' 'Terms Vhich introduce a modifying phrase or 

'clause (§..g.. , whose) 

. ^ -F' " Null (assigned to abbreviations) 

, If 

Assigned by the program to indicate intersentence 
relat;ionships or relation of sentence to title 

J Continuation of a semantic code assigned previously 

D Delete a word (can be used' with any arbitrary WCL 

entry) ^ j 



Listed in descending order of priority. 
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syntactic analysis of each sentence is per formed ♦ This ^analysis is 
carried out through use of the syntactic values associated' with 
entries in the WCL, in conjunction with procedures implemented within 
the program. One of ten possible syntactic values may be associated 
with an entry in the WCL. These values are shown in Table 3.2, 
together with their meanings. - o 

Principal use of the syntactic values is made in an analysis of 
the commas in a sentence. For effectively utilizing contextual 
inferences, as discussed earlier, three types oi commas are distinguish- 
' ed (numerical commas (e_.g^. , in 12,732) are masked at input): 1) cc./ is 
which separate phrases, here called "real" commas; 2) commas which 
separate the elements of a series, called "serial" commas; 3) commas 
which set off dependent clauses, called "parenthetical" commas. 

Parenthetical commas cause the phrase or clause they isolate to 
be deleted. Serial commas are masked to prevent Jtheir being confused 
.with real commas in -the later program steps, but are unmasked at output. 
Real commas delimit phrases or independent clauses and this information 
is used in implementing some semantic rules . 

5. Summary of the Operation of the Abstracting System 

The abstracts that^are produced by ADAM can be characterized by 
the following six observations. 

1. The terminology is the same as the parent document.* 

2. The style is the same as the parent document. 

3. No figures, graphs, footnotes, or examples are included. 
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Table 3.2. Syntactic values for WCL entries 

Syntactic Code ^ * Description ' ' 

A X Article 

'C , Conjunction 

D Delete .the word 

F * " Null word 

J , Continuation of a previous 

'syntactic value 

N .Pronoun , ^ 

P Prepositidn 

0- V. . Exclusively assigned to OF 

Q ^ " Exclusively assigned to TO 

R Exclusively assigned to AS 

V Verb 

W ' Auxiliary verb 

X c Exclusively assigned to IS, ARE, 

WAS, and WERE 

Z V Negatives 
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4. The size is about 10% of the parent documeht,. on the a enrage. 

5. Negative results, unless they are the only results, axo 
excluded. 

6. Objectives, results, and conclusions are included in the 
.abstract. 

The system has been designed to accept documents of different 
subject areas. The abstracts that are produced by ADAM vary in 
quality, but this variation appears to be due to the differences in 
quality of the parent document rather than to differences in subject 
matter. Concerning differences in subject matter, it is possible to 
produce abstracts which reflect a particular point of view or subject 
area through deliberate variation of the Word Control List* Such 
tailor-made abstracts could be produced by varying the weights of 
existing WCL entries and by adding entries which reflect the desired 
viewpoint. ' \ 

The length of the abstract is not an input parameter to* ADAM- or a 
criterion in the selection process. ' The lengths of abstracts tend to 
fall witfiin a range of 0 - 35% of the length pf the parent docume\?t. 
A long abstract usually reflects that the parent document had many 
clear l:^-stated ideas. The length of the abstracts can be modified 
indirectly by altering the ^contents' of the WCL, For example", to 
reduce the average length of the abstracts, the WCL entries should be 
constructed so that the selection criteria are more stringent and the 
rejection criteria are more relaxed. 
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The abstracts produced by ADAM seem, to be quite good and the , 
efficiency and effectiveness of the selection ""critferia are very 
encouraging. The next* step in the design of a "computer-based abstract- 
ing system is to develop- a method for evaluation of the quality of the. 
abstracts which will indicate the -direction for further improvements - 
in the system. 
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CHAPTER IV. THE EVALUATION OF THE QUALITY OF ABSTRACTS 

o 

1 . The Need for an Evaluation Procedure 

In the design of any information processing system, there is a 
need for a method of evaluating that system. In the preceding chapter 
I described an infomation processing system which inputs documents 
and outputs indicative abstracts. Although I have claimed that high 
quality abstracts result from, this system, such a claim must be backed 
by an objective evaluation of their quality. The evaluation method 
must be applicable not only to 'abstracts which result from this system 
but to the produced by other abstracting systems as well.. 

There are two basic reasons for the development of .an objective 
evaluation of the quality of abstracts. First, it is necessary to 
determine whether computer-producpd abstracts are of sufficiently high 

,ty to be used as a substitute for manually produced abstracts. 
Second, if computer- produced abstracts do not equal or 'Ourpass manual 
abstracts in quality, then it is desirable to learn in whit areas the 
abstracting systems can be improved in order to produce high quality 
abstracts. Although several methods of evaluation have been proposed, 
these methods have assumed that manual abstracts should serve as the - 
standard of comparison and have not provided a set of objective 
criteria which can be used to evaluate* and improve computer-based 
abstracting systems. 

» 
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« It IS the: aim. of chir^ chapter co present a criterijon foir -tl^ 
cbjcccxve cvaiuacion Qt abscraccs. This chapter presencs» 1) a review^ 

3naly3T3^ £7r"?rxr^tTrrj~ettiods of evaluation, 2) the requirements 
tor art evaluabon criterion, 3) the reUeioa of data and information* 
lo .ibstraci evaluation^ the declnLCion oi the data xiontenc criterion, 
and c*r^3rt?pie£ ot the application of this criterion. 

2 ^ E^istinp^ Hethods o£ Abstract Svaluatidn 

*'*iiAV ©ethc<i^^ ior trvaiuatlng chs -quallxy of absti\=icts havelieen~~^ - 

con^id^frcd and ssaj^y are currently m use. Perhaps^ the most widely us^d 

source of data for evaluation user response. Secondary services 

T^uSt prcduci* abstracts whic?-* are acceprabie to their users if they 

lntt'*nd to sell abstract jouri&ls. The uitiinate test of user acceptance 
'\ * 

\s iXk ^Mc f^sfket pLice^ and although the niunber of subscriptions sold 
reflects tSDce c^tan )*-^*5t the t^caUcy of- the abstracts, it is an extreme- 
ly it^TpDtl^int factor in c&e successful publication of abstract journals. 
KC'^^t^rTheie^sS, it is highly desirable to have evaluation criteria that* 
call be 5ppiicd before one enters the market place and prior to the ' 
dcvelpj^rie^it ot a crisis sicuaciofi. Hence, when I use the term ^ 
eviIurit:ion » I ^.ean the application of tests to the abstracts produced 
bt/ .Che system chat uill indicate whether th^y satisfy the previously 
ciirabUshed criteria for ^acceptability. Of course the market place 
^^es-t eventually be applied, but .with a greater degree of assurance 

tra3i it uHl be /net v*th success. 




Since manually-produced abstracts are frequently used as standards 

d 

with which to compare computer-produced abstracts , it is worthwhile 
to consider .the evaluation techniques used within the manual production 
of abstracts. Most, abstracting services use editors to review 
abstracts for any terrors or deficiencies. This process is usually both 
an evaluation and an- improvement of the abstracts. The process of 
editing is designed to help insure the production of quality abstracts 
and to provide for a consistency of style in the abstract journals. 
Manual editing of computer -produced abstracts could be used, but there 
is a need for an evaluation method which can indicate algorithmic 
changes in the abstracting system which will produce improved abstracts. 

The real problem is 1) the need for an evaluation technique which 
does not require a standard model for cojnparispns and 2) the need for 
a constructive evaluation technique: one that says "fair" or "foul" 
and , in the latter case says why. In other words, the technique must 
say where deficiencies exist and indicate how tO\ correct them. User 
evaluations and editor's comments, although helpful, are inadequate 
for this task for they tend to relate to specific articles and abstracts 
and are difficult to generalize into rules with broad' applicability. 

Various researchers hav^i attempted to define evaluation techniques 
which will result in methods for improving computer-based abs-tracting 
systems. Edmundson lists fhe following five methods of evaluation 
which the Thompson Ramo Wobldridge (TRW) group considered in their 
study of automatic abstracting (1): 

1. Intuitive value judgement 
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2. Creation of "ideal" abstracts to serve as standards of 
comparison. 

3. Construction of test questions about the document, to be 
answered from the abstract by a sample population 
(evaluation ofc the summary func,tion of an abstract). 

4. Retrievability of the document via the abstract ^luation 
of the retrieval function of an abstract). 

5. Statistical correlations (applic^le only to ex^;ract-type 
abstracts). 

One or more of these five methods has been used, in some form, by all 

of the other researchers in automatic abstracting. 

* *^ 
All researchers use the intuitive evaluation to some degree when 

they document the potential values of their system and some seem to use 

it exclusively. Rush,' Salvador and Zamora claim that ADAM produces 

high quality abstracts, a judgement based^ on their experience with the 

production of abstracts at Chemical Abstracts , Service (2). Intuitive 

value judgements, although widely used, can never serve in the place 

of a uniform, objective criterion. DeLucia constructed guidelines to 

be used by the abstractor while writing his abstract. These guidelines 

then served as a standard for evaluation of the final abstract (3). 

Edmundson created target'extracts and did statistical correlations 

between the computer produced abstract and the target extract (1, 4). 

Edmundson also used some of the other methods listed above. Payne, 

Altman, and Hunger performed evaluation experiments by asking two 

groups of college students to answer test questions about a document 
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after having been presented with either the document or its abstract 
(5, 6). * Although various methods of evaluation have been used, no 
one method has emerged as an acceptable standard. This is probably 
because none of the methods has a firm theoretical rationale or provides 
a measure for the primary factors which contribute tOj the quality of 
an abstract. 

3. Analysis of Previous Evaluation Measures for Specific Uses of 
Abstracts ^ 

Previous evaluation efforts have been aimed at evaluating the 
effectiveness of abstracts that have been designed to serve specific 
functions. The adequacy of these evaluation techniques can be examined 

j ■ 

in light, of the intended use of the abstract. Abstracts serve as 
accurate, abbreviated representations of documents (7). Within this 
general purpose, there are four identifiable areas where abstracts are 
used for specific functions. Abstracts may serve first, as an alerting 
tool . The user scans the abstract to determine if the document is 
relevant to his interests. 'After perusing the ^abstract, the user 
should be able to decide whether to read the document or not. The 
.abstract"'whlch is to be used as an alerting tool might appear on the 
first page of the article it abstracts, as the outputriof an information 
storage and retrieval system, or in a publication containing abstracts. 
Second, abstracts may serve as a retrospective search tool . A 
researcher can scan a set' of abstracts to locate relevant documents and 
to retrieve specific data. The classification and indexing of abstracts 
provides access to the* appropriate abstracts for review. Third, 



\ 
\ 
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abstracts may be used by indexers as the source of index terms . Using 
*the abstract can save the indexer- time because he need not scan the 
entire document. Algorithms to produce indexes by computer have also 
been designed to be applied to abstracts • Fourth, abstracts may serve 
as the data base In an information storage and retrieval system* The 
abstracts can be s.eafched fpr terms that match a user's query. Also, 
the abstracts may be presented to the user to allow him to determine 
if that set of documents is relevant. Use of abstracts in an automated 
information storage and retrieval system results in a significant 
decrease- in storage and search requirements over full text storage and 
search. ?oor quality abstracts can result in degradation of system 
performance. All of these applications require that the abstracts 
provide an accurate representation of the content of the document they 
represent. Let us consid.er the evaluation of abstracts which will 

serve in each of these ^6ur functions. ^> 

/ 

J 

\ 

3.1. The Abstract as an AlerMng Tool 

Abstracts can be used to alerr. a scientist to the existence of a 
document and to provide him with an indication of its content. The 
abstract should appear at about the same time -as publication of the 
complete document to be particularly effective. Because abstracts 
which serve as alerting tools must be published with a minimum* of 
delay, it would be beneficial to produce abstracts by computer from 
the machine-readable form of the document available from original 
page composition. Use of computer-produced extracts or abstracts 



ERIC 



98 



would provide a reduction in- cost and time .spent in producing abstracts. 

One method, used by Edmundson (1), of evaluating, extracts which 
are to be used as an/ alertir.g tool is to compare the test extract with 
an "ideal" target extract for the document. The differences between 
the test extract and the target indicate axoas of deficiency of the 
test extract. The best possible test extract woulil include all of the 
sentences of yie ^'ar:get and exclude all extraneous sentences. The 
comparison betvee,n the test and the target is made on a sentence by 
sentence basis 

The test extract may be compared to the target extract by means 

of statistical correlations. The degree of similarity between the ^ 

test and the target extract is based on a statistical correlation 

function. Edmundson presents a coefficient of similarity between 

extracts that is defined in terms of the number of sentences selected 

in common^ the number of sentences selected for the test extract, the 

number of sentences selected for the target extract, and the total 

number o f sentences in the original document (1). This- method assigns 

a numerical value to each test extract based on its similarity to the 

target. This method provides an objective ranking of all the test 

extracts based on the target extract as the standard for comparison. 

There is one inherent problem in this^ method of evaluation, that 
« 

is, how to find the ideal extract to use as a target. There is always . 
the possibility that several extracts adequately describee the document. 
The target extract must be compr.rcd to all other possible extracts' by 
means of an objective evaluation criterion and must be found to be superior 
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before it can serve as the ideal. Evaluating the- target extract can 
become as great a problem as evaluating the text extract. Another 
deficiency of this method is that it can only be applied to extracts. 
It is inapplicable to abstracts because modification of the sentences 
of the original document make it impossible to have any exact matches 
between sentences of the test extract and the target. 

3.2. The Abstract as a Retrospective Search Tool 

Many secondary services publish journals which- contain abstracts 
(as well as indexes to the abstracts). These services attempt to 
provide comprehensive coverage of a given subject area. Users search 
through these volumes to find references t:o all previous literature 
which is relevant to their current work. They are often looking for 
the development of certain trends or the introduction of specific 
data. This use of abstracts has been modeled by some researchers by 

9 < 

setting up a test situation with controlled variables. 

Abstracts have been used as the source of answers for test 

questions in several experiments (5^ 6, 8, 9). These experiments are 

designed to test the amount of data contained in the abstract as 

compared to the document and to test the ability of a-group of subjects 

to s^nswer questions based on this data. These exper:.meni:5 are usually 

constructed to reflect a hypothetical user's experience in getting 

information from a document, but with controls on many of the variable 

factors. A ll of the subjects are confronted' with the same decision . 
""""" ^ 

situation and the same data on which, to rely. The experiments attempt % 
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tio meas^'re the variables of comprehension and the ability to relate 
data. 

This type of experiment was used by Rath, Resnick, and Savage to 
test the usefulness of two types of abstracts as compared with the ' 
usefulness of the complete text or of just the title (8). The subjects 
were tested to ascertain whether they were! able 1^) to determine whether 
a document was relevant for a specific purpose and 2) to find out 

some data from the document without having 'po read it in its original 

- \ 

form. ■ From the results of these experiments, they concluded that 
"there was no major difference between the Text and Abstract groups in 
their ability to pick appropriate documents, but the Text group 
obtained a significantly higher score on the examination" (8). These 
results were interpreted by the authors as a function of subject 
population, test criterion, and document population. 

This type of experiment should be carried out^with fewer uncontrol- 
led variables to be more effective. The best abstract in this situa-* 
tion would indicate the relevance of the docupient to a specific purpose 
and would provide all of the answers to the test questions in the;? 
shortest length. Since these facfors are determinedj^^by questions on 
an examination, the user needs to know enough data to answer each of 
the questions. The abstracts could be evaluated in the same manner 
that the students' examination papers were graded, based on the number 
of questions where a suitable answer was provided. Subjects present 
additional' variables because of their ability to read, understand, 
and remember "all items of the abstract. The results of the experiment 
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are influenced by more than just the quality of the abstracts. 

Experiments which test the quality of abstracts based on the ability 

of subjects^ to answer questions on the examination rely^'^on the 

assumption that the questions presented indicate the most significant 

part of each article. This assumption may be true for some decision 
I 

makers, but it will not be true for all decision makers^ This method 
of evaluation requires a large expenditure of effort, to construct the 
examinations and to conduct the experiments, but the results of this 
evaluation do not appear to justify this amount of effort. 

*3 . 3 . T he Abstract as a Source of Index Entries 

Abstracts can be used by an indexer instead of the original 
document as a source of Index- terms (10). When an abstract is to be 
used in this manner, it must serve as an accurate representation of 
the document. It must be sufficiently complete to serve- in place of 
the document and i.t must contain all of the significant index terms . 
that could have been derived from the original document. The best 
abstract for this application would be the one that contained the most . 
index terms in the least amount of length. For this type of applicat- 
ion a telegraphic abstract might prove to be mdfe^s^iJi'tabie^than~an-. 

abstract in paragraph style.. However, the abstract which contains the 
most index entries might not be the best abstract for other -purposes. 

3,4. > The Abstract as a Data Base for Information Storage and Retrieval 
Systems 

Abstracts are used in some informa.tion storage and retrieval systems 
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tro stand as condensed representations of the dpcumencs that are avail- 
able through the system^ These abstracts are used to iridlcate that a 
relevant document is available to match a user^s query ♦ A user ^ 
requests^ certain information by formulating the topics' which interest 
him into a query* The form of the query differs with' the systejn, but 
all queries must express the essential- items involved in the user's 
request. The query is then matched against the abstract to determine 
if there is any similarity. The abstracts which match the query are 
usually presented to ,the user along with the reference ta the complete 
'document. 

* 

The abstract must serve two distinct functions here. First, it ^ 
mqst provide adequate data to che system to allow .retrieval of the 
doc^iment it represents if an<i^ only if that document is- relevant, 
according to the system's criterion, to the user's query* Second, the" 
abstract must provide adequate data so the user can determirte if the 
document is indeed relevant to his need. In" this application it is 
important to keep the abstracts as short as possible. Additional 

length would add to the cost of storage and to the time for« search and 

\ 

would require additional time for the reader to scan the abstracts. 

When the abstract serves a retrieval function, its efficiency can 
be measured by comparing it with the original document. For example, 
consider the case presented by Edmundson (1). For a given system 
suppose there exist two documents A and B. In response to a given 
query both A and B are retrieved.^ The user who submitted the query 
'informs the system that only A is really relevant to his request. . If 
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we replace document A by its abstract A' and document B by its 
abstract B* then there are four .possible results: 

1. Tlie system retrieves both A' and B'. This represents the 
same system performance. 

2. The>»system retrieves A' and not B'. This represents a 
system improvement. The relevant document has been 
retrieved but noise has been eliminated. 

3. The system retrieves neither A' nor B'. This represents 
a null retrieval situation. 

4. The system retrieves B* but not A'. This situation is the 
worst case. There is no relevant data, only noise. 

If A' and B* are extracts, then a document that was not retrieved in 
the full text search cannot be retrieved in the extract search. If 
h! and B^ are abstracts, there is a possibility of additional, extra- 
neous data being, included in the abstract which was not in the docu- 
ment. 

The user's query may be compared to uhe abstract for evaluation. 
The abstract "should include all of the terms of the query that are 
also in the original document without including any extraneous terms. 
For a large set of queries the best set of abstracts could be determin 
ed by finding those abstracts which provided the best performance for 
the greatest number of queries. 

Once the abstract is retrieved as being relevant to the user's > 
query, the user must use the abstract to determine if the document is 
indeed relevant. This determination is based on hi's ability to read 
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the abstract and to make a decision as to whether he should or should 
not consult' the original. This function of an abstract is similar to 

other occasions where it serves as an alerting tool, as discussed in 

• * 
Section* 3.1. 

. The methWs for abstract evaluation that have been developed do 
not provide an objective. method of evaluating abstracts. Ail are 
defined according to specific applications of abstracts And all rely, 
at some point, on a subjective determination of quality. A general 
evaluation criterion is needed which car* be used for several different 
purposes and which reflects a user's experience, l)ut does not rely on 
his opinions- This criterjLon is needed particularly for the design 
and improvement of computer-based abstracting systems. In -the foJ.lowing 
sections, I will attempt to define such an evaluation criterion based 
on a strong theoretical foundation. , \ * * 



A. Desiderata of an Evaluation Criterion . , 

A person ^records the results of his research and publishes this 
data in order ^ to communicate his findings to other individuals. He 
records and communicates tthe dat:^' because it may possibly be of value 
to other individuals. Information can be derived from the data by an 
individual if , "in some decision^makiQg situation, the data is valuable 
in his choice among^ courses of action. ^ 

Th6 purpose of an abstract of such a research record is to convey 
some ojc the data conta'ined. in the document to a receiver. The word 
"some" is used since, by definition, an abstract is an abbreviation of 
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its parent document and, therefore would be unlikely to contain all of 
the data in the parent documentv. An abstract is produced because it is 
•expected to be of value to someone^ A user may consult an abstract to 
determine if the document it" represents contains some items of" data 
that will be valuable to him in making his decision. The! abstract ^ 
would be of value to him if it indicated either that the document does 
or does not contain the data. A user may also consult the abstract to 
locate a specific item of data. The abstract would then be of value to 
him if the item was contained in the ^abstract . 

Thus, the evaluation of the abstract should deter«>*Ine if the abstract 
will satisfy the decision maker's need for data. Furthermore, since ' n^- 
the abstract will likely be used by many different deci-#ion makers, 
evaluation of the abstract should account for this possibility, 

5 . ' The Importance of Data 

\ * 

The abstract presents data to the user. Data results from measure- 
ment or observation. Meausrement need not be thought of as requiring 
determined action by a per.ion. Most measurement is certainly very 
casually done, perhaps involuntarily done, in fact. For example, if 
a person inserts a thermometer into some water and observes that the 
column of mercury of the thermometer corresponds to the mark at * 100° C, 
then he has made ? measurement of the temperature of the water. If he 
puts his finger in the water and declared "This water is .too _ho.t_ .to. .take 
a *bath inl", then he has once again measured the temperature for 
bathing. Thus, the presence of numeric values** is not necessary to 
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indicate Chat , measurement has been made and an Ite» „£ data is 
'p^^eseht. 

^in an abstract the sentence. "The ABC process yields a product of 
20 percent greater durability as judged by standard test #1.234." 
. provides the reader with data as to the results of the measurement of 
the ABC process b. the #1234 standa.i test. The sentence. "This-paper 

discusses the application of standard test #1234 to rh. 

LCbc ffi/jif to the measurement of 

^ product durabilitv " "^i^^ ' 

rab.Uty. . also provides data to' the reader, ais statement 

measure, the. attributes of the article in terms of its discussion of 
. certain topics, if a user needs to U„,the p'ercent of greater durabil- 
.ty fro. proce,.. ABr, then the first sentence „ould b« of value to him 
and the second s..„.„ce „ould -be »f lesser v„.lue. If. on the other 
^aud. the user „..ted to locate an articfe.that discussed the application 
Of standard test #la34. then the second sentence would „rob,bly be more 

valuable to him. Both sentances present data to the reader, independent 

of any value derived through use of this data. 

•This<»le„ of data has been tormalizod iy Landry and'. Ru^h (11 12, 

They present ^he following definition for the un.t of .ata. the data 

element ; * 

Define a "data elempn^ a *.u ' , 

-osnUed^r7^:t:'JLt\:rk^"\L'sTof''^J."" - 

given prec IS Lon^of measurement. 

l^ey also define a document in terms of thi,-, definition. 

Define a document. D a<; a uoi i « j j 

£, U, as a well-ordered set of data elements 

instance, one type of dat, element that the abstracting system m„.t 
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identiry ts che sentence <^ Sentences provide a partitioning of the 
Jocojaenc and hence are the class -of things being^measured. The Mnit of 
ffiea6.urc is a string of characters bounded by periods. The precision 
is the cardioalicy of the sec of sentences. At the same time, ADAM 
^J^leus each sentence as bqiing coaposed of a well-ordered set of data 
elexsencs, vords. The uoic of nieasure is the string of characters * 
bounded by blanks. The precision is the cardinality "of the set of word 

O 

ceojprtsing a sentence. 

» - s 

^ Soch a doctcBenc and an abstract, which is' a document itself, can " 
be analyzed in terms of the dara elements they contain. The importance 
of this n*/tion lies m the fact that, since" the data may be useful to. 
k decision maker in some decision making situation, the abstract, or 
the parenc document can Le assigned a value proportional to the amount 
of data it contra Ins. . . • 

& ' the. Relationship Between Data and Information 

Tbs- value of , data, arises from the ability of the decision maker to 
use cae data in making deci-sions. it a person were trying to decide 
vhether to, read a given document^ he might first read the abstract to 
determine if the document would be of interest to him. If, after 
reading the ^bstrac-t, he is able to decide either yes, he- should read 
the document, or no, he shjould not; then the data has been valuable to 
him.. We must observe his actions after reading the abstract to deter- 
mine tf indeed the abstract influenced his course of 'action. The 
l?elation^>ni,p implied here between data and the decision- can be seenl in- 
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the model, proposed- by Yovits and Ernst (13, 14), which is shown in 
Figure 4.1. This model introduces the following definition: 

Inf-ormation is data of value in decision making. 
The importance of this definition for the evaluation of abstracts is 
s^een through the following arguments. A single data element will 
either be valuable or not to a decision maker ^ in a given situation; By 
contrast,^ a set of data elements, such as an abstract, will have a 
value in the interval 0-1 which is proportional to the number of data 
elements in the set that are actually ^of value to the decision-maker , 
relative to the* total number of data elements in the set.. • The 
cardinality ^of a set of data el.ements is fixed, but the cardinality of 
the subset that is of value to a decision maker in a particular decision 
making .situation is variable. ' ' ^ . 

. Thus, the ideal abstract should present to a user the data^'thaf 
will be of value to^him in making his particular *dec ision . While this 
is a sufficiently difficult task, * this- problem becomes further complicat 

« 

'ed because, a single user might employ the abstract in. several decision^ 
situati-ons or many, users might use the abstract in a particular 

decisi'bn situation. In general, an abstract will be used by many « 

« J 

decision makers in raany decision situations, so "that an abstract should 
provide useful data to a range of decision makers over a range of 
decisi-on-making situations. -But it' is this very variability in the use 

e 

of the^^abstract whi,ch defeats attempts to evaluate abstracts on the 
rbasis of their usefulness to decision makers. 
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Figure 4.1 The generalized model of information flow proposed by 
Yovits and Ernst (iST 14). 
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6 . 1 . On the Difficulty of Using "Information Content" as a Basis for 
Evaluation of Abstract? ? 

Using the Yovifcs-Ernst model of a generalized information- system, 
let us consider the Value of a particular* document in some decision- 
making situation. In order toj measure the value of the document (set 
of data elements), the following essential factors must be considered: 

1* The set of decision makers and their prior knowledge 

2. The decision situations 

I 

The external environment 

1 

(Factors 2 and 3 are as'sumed to be constant; this is an 
assumption which is impossible to verify.) 
A* The execution function, that the decision function and 
the uncertainty involved with the probability of predicated 
outcomes 

5. ^ The isolation of just those observables that are influenced 
by the data contained in the document being studied. 
While it would be useful , to measure the information derived from an 
abstract and from its parent document in order to evaluate them, the 
list of factors which must be measured to determine the derived 
information makes the task of evaluation quite impractical, if not 
impossible* ^ 

6 ♦ 2 ♦ On the Usefulness of Data Content, as a Basis for Evaluation of 
Abstracts 

\ 

It can be seen that information derivabled from a document is a 
relative quantity depending on the, determination of value, whereas the 
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quantity of data contained in the document is independent of such a 

determination. The relationship between information and data is shown 

in the Venn diagram in Figu7:e 4.2. In the Fi^gure, the set D represents 

the set of data, i^.e. a document. The subset of data which is valuable 

to a given user A is denoted I , the subset value to user^ B is denoted 

A 

I and the subset valuable to user C is denoted 1 . The information 
B , C 

which-can be derived from a given set of data ranges from a minimum of 
0, where none of the data is of value, to a maximum where all of Xhe. 
data is of value.' Thus, the amount of information derived from a set 
of data can never exceed the amount of data itself. 

Potential utility of the data contained in the abstract can be 
generally predicted based on. the number of data elements contained in* 
the abstract relative to the parent document. Since the abstract is 
also an abbreviated version of the parent document, it is desirable *to 
maximi7;e the number of data elements contained in an abstract of a 
particular size^. The abstract which will serve the largest group of 
users should provide an efficient representation of the data contained 
in the parent document. The best^abstract may not serve each of the 
users equally well, but we can predict that it will have the greatest 
likelihood of providing useful data to the largest group of users. 
The arguments presented in the preceding sections form the basis of an 
evaluation method which is described and illustrated in the remainder 
of this chapter. ^ r 

. My assumption^ and one which I believe is held implicitly by most 
abstracting services, is that an abstract should serve a large group of 
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users in a wide variety of decision-maki^ situations. Consequently, 

the abstract should contain an efficient representation of the data of 

the parent document. While I hold that this is the best abstract for 

general applications, it must be rea^lized fefiat such an abstract may not 

be "best" for eacK individual user. Nevertheless, I believe that a good 

general abstract will have the greatest likelihood of providing useful 

data to the largest group of users in the widest variety of decision - 

making situations over the longest span of time. 

An evaluation method, based upon Zhe arguments presented in this 

and preceding sections, is described and illustrated in the remainder 
■ • 

of this chapter. 

7 . A Two-Step Procedure for the ^Evaluation of Abstracts 

In the preceding sections, I have considered the theoretical 
foundations for an evaluation criterion. I have argued that an abstract 
should present an accurate representation of the data contained dn the 
parent document in a condensed form. I have developed a two-step 
objective procedure to determine how well an. abstract, meets this goal. 
The .two steps of this procedure are described below. 

Step 1. Determine if the abstract conforms with the criteria 

for an acceptabl,e abstract. 
Step 2. De*termine the data coefficient for the abstract or- 
^ abstracts that satisfy Step 1. 

A. If there is only one abstract, then the value of the 

data coefficient should be greater than or equal to one. 
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B. If there is more than one abstract, of whinh at least, 
..one has a data coefficient greater than or equal tc one 
ehe^ best abstract will be, the one that has the highest 
data coefficient. ^ * 

In the following sections, I will conside'f^each of these steps in more 
detail. c 

* % 

/ 

-7.1. Step 1 

The first step of the evaluation procedure consists of the follow- 
ing test. 

V Determine if the abstract conforms with the criteria for an 
acceptable abstract. 

The implementation of the first step depends on specif icati,on of the 

criteria for an acceptable abstract produced by a given abstracting 

system. Every abstract 'user has his own requirements for what an 

abstract should contain and these requirements probably vary, with time. 

The overlap of these requirements over a large group of users is 

uncertain, although probably small. The Subcommittee 6 of the American 

National Standards Institute (ANSI) Committee Z39 has attempted to 

define those qualities of abstracts which should serve as uniform 

standards for abstract production (7). These standards are. designed 

to be applied to the production of abstracts of docimients from a wide 

range of subject areas. Some abstracting services have individual 

specifications that either coincide with or supplant the proposed ANSI 

standard (15). 

Fbr automatic abstracting development and evaluation research in 
this Laboratory, we have chosen to use the ANSI standard- as described / 
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by Weil as our goal. This "standard instructs the human abstractor or 

computer-based abstracting system to (7) 

I Keep abstracts of most papers to fewer than 250 words, abstracts 
of reports and theses to fewer than 500 Words (preferably on one 
page), and abstracts of short ^communications to fewer than 100 
words. Write most abstracts in a s ingle* paragraph ; Normally 
employ complete, connected sentences; active verbs; and the 
third person. Employ standard nomenclature, or define unfamiAiar 
terms, abbreviations, and sjmibols the first time they occur in 
the abstract. 

Any abstract which does not meet this standard should be" edited or 
rewritten to conform with these criteria. 

In order to implement Step 1, there are two essential questions to 
be considered. First, does the abstracting system meet the design 
specifications and second does user feedback indicate that the design 
specifications should be changed. In terms of the discussion of Chapter 
.?, Step 1 amounts to a determination 1) of how closely the a priori 
and operationally defined intensions of the abstracting system 
correspond; 2) of how well system intension matches the user's view of 
their purpose. 

Actual tests of which Step 1 is comprised are ba^ed upon system 
design specif ications .-—TiTrs is an important point. The system has 
been designed to produce abstracts'* of a certain nature. Step 1 amounts 
in part to an alternative system design which should rate abstracts, 
produced by the original design, according to the design criteria 
embodied in both designs. Whenever discrepancies are found, it is 
necessary to determine which of the two designs gave rise to the 
discrepancy and to decide whether the difference is of consequence. 
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If so, either the abstracting system or the evaluation system must^ be * 
altered to eliminate the discrepancy. ♦ 

Step 1 also takes cognizance of user acceptance of the abstracts 
produTced by the abstracting system under study and attempts to determine 
how to alter \he system to meet the us.er's demands. 

" I shall not attempt to deal with Step 1 in detail in this 
dissertation. However, some observations of a general n^ure are given 
to conclude this section. 

Step 1 might include a determination of whether the following 
'Criteria are satisfied J^y a given abstracting^ system: 

1. Maximum length 

2. Minimum length 

3. Bibliographic citation format 

4. Subject orientation 

5. Error level ^ 

6. Style 

7. Sentence completeness 

8. Form (block vs. paragraphed) 

9. Type (indicative, informative, etc.) ' \ - 
10. Timeliness 

For ADAM, I will assume that the system has been designed .so that the 
abstracts produced conform to the desired criteria. If this assumption 
is not valid, the system design should be modified* 

There may be several abstracts which sa.tisfy the criteria in Step 
1 but which differ from one another in some respects. These differences 
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may be in content, organization., or sentence structure. The differences 
may also reflect a particular user's needs for certain topics to be 
included in the abstract*. Thus if several abstracts of a given document 
p^ss Step 1,* there is then a need to choose the best -abstract from 
among this set. "Best" is used here in the sense of Section 6.2. The 
choice of best abstract is accomplished through the implementation of 
Step 2". ' 

7,2. Step -2 

The second step of the evaluation procedure consists 'of the 
following tests. 

Determine the data coefficient for the abstract or "abstracts 
that satisfy Step 1. . ' 

A. If there is only one abstract, then the value of the 
data coefficient should be greater than, or equal to 
one. 

B. If there is more than one ^abstract , of which -at least 

' one has a data coefficient greater than or equal to one, 

the best abstract will be the one that has the highes.t 
dafca coefficient. 

The implementation of Step 2 depends upon 'the evaluation of a defined, 
ratio, which I have called the data coefficient « * ^ j 

7.2.1. Definition of the Data. Coefficient . , . ' ' 

The data coefficient is a function which expresses a; relationship 
between the data contained in an abstract and the length of the abstract 
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Since abstracts are, as I, have said, abbreviated representations of the 
original document, they should contain as much of the data in the 
griginal document as possible* while being much short.er than the original 
ThMCy in evaluating abstracts the data coefficient should reflect 
this desired property of abstracts. The following definition of the 
data coefficient., DC, serves this purpose. 

. ' r>c-=^ - 

E 

• ■ 

where C is the data retention factor , 

♦ ^^^^"^^^ « « 

the amount of "data in the abstract 

p C = ^ * ^ 

the amount of data in the document 

and where L iSj;the length retention factor » " 

^ _ the length of the abstract 
the length of the document 

Not only shoLld the DC relate the content and length of an abstract in 
a meaningful way^^it. should make possible ,both^ comparative and absolute 
evaluations of abstracts. For this latter purpose, an understanding 
of the significance ^of DC values is requirea* 

There are three classes of abstracting function: 

l» 'The class of -abstracting functions that reduce length at ' 

a greater rate than they reduce data content; 
2. The class o£ abstracting functions that reduce length 
and data content at* the same rate; ' • ' - . 



119 



'3. The class 'of al)Stracting functions that reduce data 
' content at a greater rate than they reduce* length. 
Hypothetical cur^ves illustrating the behavior t)f these three classes of 
abstracting functions, and representing all possible abstracts of a 
given document, as viewed from the per s pey£i ve ^ o f the data coefficient, 
are .given i.n Figure 4.3. Although smooth curves are shown, it. should 
be emphasized that the data coefficient is not regarded as a continuous 

function. The reason for this, if not already, obvious, will be made 

• » J* 

clear shortly. The purpose of the Curves of Figure 4.3, is to establish 

bounds for DC values which represent acceptable abstracts. In general, 

we can say that abstracts whose DC values lie above the straight line 

will be acceptable and otherwise not. However, it *is also desirable 

to know what significance att:aches to the magnitude of the DC value." " 

vSince the magnitude of the DC value can be interpreted as the slope of 

a line from the origin to the point representing the DC value, we are 

lead to a consideration of the general shape of the curves shown in 

Figure 4.3, ^ , - ^ v 

/ 

The process of abstracting implies that there will be a reduction 
in length. The , reduction in length can be measured in terms of the 
number of characters, words, sentences, oV'other defined , units , that 
are retained for the abstract relative to the number of such units in 
the original document. It would be possible to generate a ,set of 
abstracts to represent each possible length by reducing the length of 

the document in unit Intervals. For example, je set of abstracts 

♦ * 
could be generated by reducing the length of the document one word at 
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'Fraction 
of:''' 

Content 
of.'. 

DocumenC 




F^raction of length o.? document , ' J 



Represents redaction in content equal to r^duction.yin lehg"th 



..^^^ Represents, reduction in ccntciht 'less than redu(?ti^bn in' length.* 



• — Represents reduction in content girea^er^ than *^duct ion in 



liingth. 



Figure 4.3 Hypothetical curves for content/ length' cft^'^fSaris^ in the 
evaluation of abstracts*. , : 
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a Cime^ By selecting -w/^rds at random, it possible to produce a set 
.of abstracts vhich would provide a set of discrete ppipts on- the graph 
ax each unic*of length,. The crucial question is the relaL7onship ^ ^ 
beCWefin the xeducxlon in "length -and the reduction" in .data content. 

Tne r^^iacionship between-data content and length Jor a set of 
possible abstracts for one document could be characterized by c'onnecti^ng 
the set of <liscrace points to form a curve.-. This curve ^could be o 
analyzed ,to- select the one best abstract, in* terms of its efficient 
data representation; from the set. This curve. could also be used to 
predict cne behavior of the same abstracting system , on another similar 
docuiDeat.. A sec of such curves for a group -of representative documents 
eouid be used to characterize the operation of ^h*e abstracting system 
being evaluated, * ' - ' 

The construction of a set of curves *would require the generation 

V 

of a large ntxmber of possible abstracts. Although the task of creating 
the abstracts would be difficult, it is possible to experimentally 
determine the shape of .the curves. I have performed some limited ' 
expeViments by generatiiig a set of possible abstracts for one document, 
the abstracts were created by selecting words at random to delete. 
For each word that was deleted a new abstract;, shorter by one. word, 
was foxrnied.- The data content of each of ^he abstracts was evaluated, 
biased on these experiments I have made the following observations about 
Che nature of the curves in general. The theoretical curves can be 
characterized by the relationship between content and length. There 
are three possible values for this relationship : the. fraction of the 
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content may be greater than the. fraction of the length, C may be less 
than L or C may equal !♦ Let ^us consider each of these three cases. ■ 
< If C > L then the abstract production method provides a greater 
reduction in L than €♦ This is clearly a desirable property because 
the abstract is presenting an efficient representation of the data to 
the user. The slope of this curve will^ probably be quite small for 
small reductions of length. Since'written communication is generally 
quite redundant (16), it should be possible* to reduce the length by 
eliminating some of the restatements of data elements and function 

r 

words and phrases without a corresponding reduction in -data. At some 
point though, the reduction in data may exceed the reduction in length. 
This wi]l occur when the data elements are represented so efficiently 
that the removal of even one unit of length removes one unit ot data. 
For example, if one unit of data is the clatfse which contains several 
words and one unit of length fs the word, then the removal of the 
significant word of the clause, the verb, would reduce the d^ata to zero 
but thes. length would only be reduced by one word. If the evaluation of 
an abstracting system showed that all abstracts produced by the system 
showed a greater reduction in data than in length, this would be 
undesirable and major system redesign would be indicated. 

The critical point" occurs when the percentage reduction in content 
is equal to the percentage reduction in length.. The goal of abstracting 
is to reduce the length at a faster rate than tiie content. If the 
original document is so well .v/ritten .that it would be impossible to 
reduce the length without a corresponding reduction in content, then* 
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the document would serve as its own J)est abstract/ If the document is 
evaluated as an abstract of itself, the data coefficient will equal one 
since C and L will both be one. Therefore, there will always be at 
least one"* abstract with a data coefficient greater than or equal to one 
if we allow the possibility of using the document as an absjiract. 

Constraints may be placed on the abstracting system to produce 
abstracts with a specified maximum length. If, for example, the abstract 
must be shorter than 250 wprds, there^ is no guarantee that it will be 
possible to find an abstract with a data coefficient greater than one. 
Under constraints .of this nature, the entire set of possible abstracts 
may have data coefficients less than one. 

Based on the value of the data coefficient we can make the following 
generalized assessments of value: 

1. If DC < 1 then the abstract is unacceptable 

If DC = 1 then the abstract is at the minimum level of 
acceptability 

3. If DC > 1 then the abstract 'is acceptable 

4. If two abstracts both have DC values greater than one, then 
both are acceptable but the higher DC value indicates the 
better abstract. 

If the length of the abstract .is 0 then the DC is undefined. This fact 
does not limit the effectiveness of the DC since it is logically 
impossible to evaluate an abstract that does not exist. The values of 
the DC range from zero to the number of data elements. 
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The determination of the data coefficient represents the second 
step of the evaluation procedure mentioned earlier ♦ It represents 
first, an absolute measure of* abstract acceptability and second, a ^ 
criterion to determine which of two abstracts is better. Since the 
data content provides^an upper bound on the amount of ^information that 
can be derived from the^'data in an abstract^ the data coefficient^ also 
is a 'prediction of the usefulness of the abstract. In order to apply * 
the concepts for* abstract evaluation to an automatic abstracting system 
there must be a definition of a means, a) of identifying data elements 
and b) of measuring lengths 

7.2.1.1* Definition of the Units of Data and Length 

The use of data elements as a means of evaluating the quality of 
abstracts depends upon the identification of data elements within the 
,abs tract and within the document • The method of identification must 
define the boundaries of data efements in a meaningful and consistent 
manner • The definition- of data element given earlier--?, data element, 
d, is th"fe smallest thing which can be recognized as a discrete element 
of that class of things named by a specific attribute, for a given 
unit of measure with a given precision of measurement --is really^ 

. i 

quite flexible* The possibility of a data element being any desired, 
entity is a*convenience from both a descriptivfe and a theoretical 
viewpoint". In order to specify a data alement for a given situation, 
it is necessary to specify the attribute, the unit of measure, and the 
precision of measurement. Once the nature of the data element is 
specified^ we can apply *the data coefficient measure required by < 
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Seep 2 of the evaluation procedure I have proposed. * 

For application to absftract evaluation, the attribute being 
measured is data content. The data elements are, in this case, usually 
defined to" be words, clauses, and/or sentences, but specific numerical 
quantities, equations,^ tables or figures could also be specified as data 
^ elements. The unit of measure could then be, for example, the word, 
clause, sentence,' etc. , and the precision of measurement v;ould be 
specified-by the number of data elements identified by the unit of 
measure. The data elements of a given abstract or document may be 
defined by specifying the essential properties necessary fqr their 
identification or by enumeration of all possible data elements. The 
manner in which a data element is specified is dependent on the function 
of the abstract. The data' element must be defined in the context of 
its use as the minimum unit of data that has potential value to a 
decision maker. 

We can measure the accuracy of representation in terms of data 
content. We now need a measure of the 'abbreviation. P^rhap^ the 
simplest measure of /length is the number of characters corftained in 
uhe document or abstract. The number of bits that a document occupies 
when in a computer could be calculated by multiplying the number of 
characters times the number of bits per character. (Blanks, and special 
symbols are included in the list of characters). Other measures of 
length might incTude the number of words, the number of sentences, or 
the number of printed lines. Any uniform criterion 'for the measurement 
of length would be acceptable. 




7 • 2 . 1 v2 . Implementation of th'e Data Coefficient 

In order to implement the data coefficient within the constraints* 
specified above, a general formulation of the 4ata element is required. , 
In some of the specific applications" of abstracts, as for example 
when abstracts are used as a source for index terms from a controlled 
vocabulary, it would be possible to enumerate all possible data elements^ 
where the data element is defined as a single index term. For -a 
general application it would ^be* difficult to enumerate all data 
elements. Furthermore, the chosen .data elements should be able to 
reflect all of the data contairfed in the abstract and should not be 
limited to a fixed list of possibilities. 

A data element should, in a general sense, be defined to be the 
minimum unit of data that has potential value to a decision maker. In 
an abstract this unit of data might be expressed by a single word, but 
in general, several words would be needed tp express one complete data 
element. One sentence might, express ^one data element and a ^compound 
sentence might express two or more. Thus,- it would be inadequate to 
merely count the number vof words or the number of sentences to deter- 
mine .the number of data elements in an abstract. 

A document conveys 'data to a reader 1) through the unique organ- 
ization of language elements within the document and 2) through the 
significance imputed to these language elements outside the framework 
of the document. Since the s^^ignif icance imputed to the language elements 
depends on the decision makery his prior knowledge, and his decision- 
making situation, we cannot ascertain the value of the data contained 
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in the document to all potential decision makers. But it is possible 
to examine the unique organization or language elements within the 
document and the manner in which they convey data to the user, the 
examination of the language elements within a document depends upon a-, 
linguistic analysis of the text to determine the^ structural representa- 
tion of a single data element. The quality of the abstract will then 
be dependent on the accurate preservation of the data elements of the 
document. ' ' • ^ 

The act.ual implementation of the data content criterion depends on 
the specification of" meaningful boundaries for a data element. The. 
identification -of content elements in natural language documents has 
received a great deal of attention. As Christine Montgomery states' 
(17) much of the recent work in linguistics-, as well as in 

computational linguistics, might be entitled *In search of a formalism 
for content' represen tat io'n* .** The identification of concepts by means 
of linguistic analysis has received attention because it is significant 
to almost all natural language processing endeavors. None of the various 
linguistic research efforts have arrived at *a definitive method of 
content representation but a few basic tenants seem to be emerging. 
According to Montgomery (17): 

Perhaps the most encouraging note is tlje emergence of certain 
fundamental principles to which a majority of these researchers 
ar^ committed. Central among these is the notion of the . |' 
predicate as pivotal in semantic and syntactic analysis. | 
... the term 'predicate* in this context designates any ) | 
relation holding between two or more eptities (its arguments 



in the logical sense) Or any property of an entity. ' j 



^1 
i 

Language'may be considered as a means of communicating about certain 
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entities and the relationships between' these entitles. 

Evidence is being found to support the preeminence of the predicate 
in all communication. Researchers have identified this preeminence in 
several different languages. This feature of linguistic analysis also 
appears to be independent of the subject-matter of the particular example 
Thus a linguistic analysis bafeed on the centrality of the predicate 
could be applied to abstracts of any subject discipline or any language. 

I have developed a method for identifying specific data elements. 
TKis method is based on 'the concept of the centrality of the predicate \ 
and on some specific ideas 'of Young ^18). This method is designed to 
provide a specific means of identifying data elements where one data 
element is defined to be equivalent to one concept. This formulation 
(which is presented in the ne^it section) is only one of several 
possible ways of representing data, elements . "Other means of defining 
the data element could certainly be \ised in the data concentration 
criterion. The crucial factorj^s 'that the identification and the 
preservation in the abstract of the data elements of the document is 
essential to good abstracting. 

7.2.1.3. The Representation of Data Elements by Name-Relation-Name - 
Patterns * 

Words and groups of words, which are called language strings, can 
be .classified into one of two groups, names or relations. In general 
names are used to identify objects or constructs and relations are usee 
to express the relationships between names. A name, denoted N, is 
defined as a language string assigned to a behavior imputed to an 
object or construct, as well as those language strings which have only 



129 



a linguistic |fatTtftion. A name may be simple, composite '<v: complex; ' A 
simple name is a single word which is assigned to an oB^ct or construct* 
A composite name is a sequence of simple names. A complex name is an 
ordered triple, N. R N. where N. and N. are simple, composite, or 
complex name's and R is a relation. Complex names allow -for N - R - N 
patterns to appear as a name in another-'N^ - R - N pattern. 

A relation, denoted R, is defined as any language string that 
expresses the relationship between two names. ^Relations may be either 
primary or secondary. Primary ahd secondary' relations may be either 
simple or composite. Primar.y relations are defined as any word which 
functions as a verb. A secondary relation is defined as any preposition. 
A simple relation is a primary or secondary relation that consists of 
only a single word. A composite relation is a relation that is made up 
of a sequence of simple relations. If the sequence contains a primary 
relation, then it is primary; otherwise, it is secondary. 

Using these definitions, it is possible to classify all words in 

written text as either names or relations. A sentence* or part of a 

sentence can be described as an ordered triple, N R -cN , where N 

I 1 p 2 1 

and are names (simple, composite, complex, or vacuous) and where R 

is a primary relation (simple or composite, but never vacuous). By 

i 

categorizing words of the text into N - R^ - N patterns, it will be 
possible to identify data elements. 

Xhe .name-relation-name pattern expresses the relationship between 
two objects or constructs. This formula,tion corresponds to the pattern 
of a simple sentence where the first name* corresponds to the subject. 
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the relation to the verb, and the second name to the object. A simple 
sentence traditionally is used to express one idea. This formulation 
also presents a^ method for structuring informations The names may be^ » 
used to represent nodes of a graph and the relations edges, between the 
nodes. The sum'of many dr.ta elements would form a complete network of . 
data. 

It is possible to identify N - R - N data elements in a text by 

P 

s. 

the following procedure. First, identify all primary relations. 

Second, associate with each primary relation tjie first name, N^, and 

the second name, N . The names may be simple, compound, complex or 

\ 2 * . " 

vacuous. 'Each N - R - N triple, where R is a primary relation, is 
4} 1 p 2 P- , ^ 

'a data element. The number of data elements, which are expressed by 

N - R - N patterns will always be greater than or equal to the 
1 P 2 

•number of sentences in the text being .examined. The number of data 

<> 

elements associated with various sentence constructions is given in 

Table 4.1.* The number of data elements contained in each sentence 

corresponds to all basic N* - R - N patterns. 

1 p 2 

In this definition of data element, each N - R - N pattern is 

1 p ^2 

considered to convey data to the reader in an amount equal to all other 

N - R - N patterns. Based on this assumption, all data elements in 
1 P 2 

the text are given equal weights. If there were some function which 
could predict the^ potential value of certain data elements, it would be 
useful to rank the data elements based on this functional weighting. 
For example, the data elements might be given a higher weight if keywords 
from the title or index were present in either of the names or the 
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^able 4*1 Summary of the number of data elements associated with sentence^ 
constructions 

. Number of 

Data Elements 

• 

N - R - N Patterns 

P - . 

" * *' 

Within sentence patterns 

Simple Sentences 

N R N„ • ■ . • 1 , 

1 p 2 . « 



Compound Sentences 

* * 

N R " n" 'and' N„ R N, 
1 Pi 2 3 P2 4 ^ 

N R 'and' R N 
1 Pj^ P2 2 • 

N 'and' N R N 
1 2 p 3- 

N R N 'and' N 
1 p 2 3 

N 'and' N R N 'and-' N, 
1 2 P 3 4 

N 'and' N R N 'and' N , respectively 
1 2 p 3 4 » 



2 

2' 

2 

2 

4 

2- 



Complex Sentences 



Each clause, N R N 
1- p 2 

'that' R N 
P 2 

N R 'that' 
1 P 

'which' R <N 
P 2 

N R 'which' 

1 P 



1 

0 
0 
0 
0 
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relation. The data -elements might also be^ weighted .based , on the number 
of times the data element appeared in the original document. Any mean- 
irvgful predictor of data element value 'for a given application would 
probably improve the performance of the evaluation measure.' 

8. Examples of the Application of the Evaluation Criterion 
8.1.* Sample Document* with-" Six Possible Abstracts 

This section provides an analysis of a sample document, .its title^, 
and its six possi-ble abstracts in terms of the" data content criterion. 
I will assume that each of the six.possible abstracts have satisfied 
the criteria in Step 1. THis exatrfination will rank the six according 
to the criterion given in Step 2. The document and abstracts 1-5 used 
as material for this examination- were^ selected because .th^y were used ^ 
as the .basis for another study by Hirayama on* abstract evaluation (19). ^ 
The original document was written in Japanese and translated into - 5 
English and is shown in Figure, 4. 4. Five abstracts of this' document 
were found by Hirayama, written in Japanese, English and German. These 
-five abstracts', prepared^ by different abstractors, were trails lated or 
rewritten into English by "the same person. The fiye abstracts are ^ » 
shown in Figures 4.5 througl\4.9. "I have included a sixth abstract, 
producec vy ADAM, and the title for examination. These are presented 
in Figures 4.10 and 4.11, respectively. . " * ^ 

For the document, title, and abstract, I have identified the data 
elements contained in each. 1 have indicated the principal word in each, 
name by a single underline and the words in each primary relation by a 
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Abstract 1.— Absorption max. . of polyenes possessing 
aiixochrome can be expressed by (Amax.)' = A - BG'^ and 
A max: iri various solvents can be calcd. from this formula 
J}y varying B. Approx. linear relation was found' to, exist 
between B and n of the solvent. 



Total Words O,? 
Data Elements , 3 
\ Content . 17' 
Length .12 
\[ . DC-" 1.37". 



Figure 4.5, Abstract,' with data elements identi'iidd 
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. • Abstract 2 —Absorption max. of 57 kinds of polyene. ' 
compdsA including H— (CH = GH),— H, were measured in 
soln. of M^OH, hexane, petr. ether, ligrbine, EtOH, 
CHCt, CgHs, and CS>. ,A formula common to all these 
soi vents can be obtained by changing B in the 'formula 
{X^xax)* = A. + SC for relationship betw.een ,Xn,ax. and . 
'structure. Inhere is a linear relationship between the value 
e- of Band /I of various solvents. 



total Words -63 
Data Elements *3 
Content .17 
Length .195 • 
DC .88 



Figu'e ^.6 Absxracc 2 with data eVi^nts identifTed 
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Abstracts. — The difference of (Xinax.)^ which increase 
as N increases, between a -certain^ solvent and EtOH is 
shown-to be the solvent effect of^ B. and S ha| a linear 
relation ton n of the solvent. Calcd. values of X^ax. 
57^compds. in MeOH, hexane, Et20, EtOH, cyclohexane, 
CHCI3, CgHg, pyridine, CS2, petr. ether, or gasoline 
showed good agreement with, observed values when 
^ '^caik= 12.986 + 19.019 h^. 



Total Words 64 
Data Elements 3 
Content .17 
Length .199 
DC .84 



Figure 4.7 Abstract 3 with data elements identified 
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Abstract 4.— 'i"he formula for calcg. the 1st absorption 
max. of polyenes can be expressed by i^maxf = A - BC . 
It is shown that the solvent modifies B and that there is 
a linear relationship between B and no of the solvent. 
The calcd. and observed 1st absorption max. are tabulated . 
for 57 polyenes. The and B'^^"' values were: MeOH, 
1.3288 and 38.42; hexane, 1.3751 and 38.74; petr. ether, 
1.37 and 38.82; EtoO, 1.3556 and -; EtOH, i:3633 and / 
39.10; light Kgroine, 1.38 and 39.17; cyclohexane, 1.4264 / 
and -; CHCI3, 1.4486 and 40.72; CeHe, 1.5017 and -41.32; 
pyridine, 1.5085 and -; CS2, 1.6319 and 44.18. 



Total Words lOI 
Data Elements 5 
Content .28 
Length .31 
DC .88 



Figure 4.8 Abstract 4 with data elements identified 
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max. 



Abstract 5.— The formula 'given by the author for X _ 
of polyenes in the form (Amax.): = a + B(l - C^) is examd. 
for its susceptibility to the solvent. It is shov.fn that the 
parameter B ^ characteristic to each solvent and the 
following linear relationship to no was foun d.: Bcfc. = 
12.976 f 19.019 n'o. The agreement is very soM between 
the observed value and B value calcd. from the above 
formula for MeOH, hexane, petr. ether, EtoO, ligroine, 
cyclohexane, CHCls, CeHe, pyridine, and CSo. The 
parameters a and C MS. independent of solvents. The 
author calcd. Xmax. ^ox 57 polyene derivs. in the above 
solvents and compared them to the values given in the 
literature. 



Total Words 105, 
Data Elements 7 
Content .39 
Length .33 
DC 1.19 



Figure 4.9 Abstract 5 with data- elements identified 
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irotal Words 8 
Data Elements 0 
Content .0- 
Length .025 
DC .0 



Figure 4.11 Analysis of the title of the sawp^fe document 
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Title. Absorption Spectra and Chemical ^Structure. ^ II. t 
Solvent Effect 
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Abstract 6. A figure shows that there is . an approximately 
linear relationship between 
the solvent. 



Bf^"**^ and the refractive index of 



Total Words 19 
Data Elements 1 
Content .055 
Length .059 
DC . 94 



Figure 



4.10 Abstract 6, produced by ADAM, with data elements identified 
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double underline. The total number of data elements and the total 

■ ^ ■ . V; 

number of words is shown for each immediately below the text. The 
totals for each of the abstracts and the document are used to calculate 
the fraction of length and fraction of content for each abstract. These 
values* are plotted on the graph shown in Figure 4.12. 

The abstracts, document, and title can all be ranked according 
to the increasing value of the data coefficient. This ordering is shown 
in Table 4.2. Abstroact. Ij which represents '17 of v he content in .12 

o 

of the length has the highest data- concentration and should be selected 
as the best abstract of the six possible. Abstract^© had the. highest 
number of data elements of any abstract, but it represented one third 
of the length of the original, so its data coefficient was only second 
highest. Both abstracts 1 and 5 had a higher data concentration than 
the original document. Abstracts 6, 4, 2 and 3 showed a greater 
reduction in content than in length and are therefore unacceptable by 
the data content criterion. 

This method of evaluation cqn be implemented to identify N - - N 
patterns by means of the text analysis programs useS in Chapter V and 
a routine to calculate the values of the data coefficient. The most 
efficient method would be to utilize the structural analysis data that 
results from the analysis for the modification of sentences. The 
evaluation phase should be logically separate from the abstracting 
algprifcnm, but it need not be implemented separately. 

Ahis evaluation scheme would also provide data on the quality of 
the abstracts to be used as feedback to the system. For example, an 
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Document 




Fraction of length of document 



Figure 4.12 Graph of content/ length comparison for the sample document 
its six abstracts, and its title. 
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Table 4.2 Ranking of the sample abstracts by the Data Content Criterion 



Abstract 

Abstract 1 
Abstract 5 
Document 
Abstract 6. 
Abstract 4 
Abstract 2 
Abstract 3 
Title 



Data Coefficient 

1'. 37 
1.19 . 
1.00^^ 
6.94 
0.88 " 
0.85 
0.84 
0.00 
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extremely long sentence that contained only one data element should be 

Zified to make" it more concise or ^ eliminate ijt from the abstract, 
s evaluation criterion favors concise sentences with clear N - R - N 

P 

patterns, V7hich are the sentences that convey to the user the maximum 
amount of data in the minimum amount of length. 



8.2. Fourteen Abstracts Produced by the Abstracting System ADAM 

Th:.5 section provides an -evaluation, using the data concentration 
criterion as an absolute standard, of fourteen sample abstracts produced 
by ADAM- Here again L am assuming that a 14-^ of the abstracts have 
satisfied the criteria in Step 1 of the evaluation procedure. The 
fourteen documents represent various subject areas, styles of writing, 
lengths of articles, journals, and books. The abstracts produced from . 
the*se documents also vary, widely in terms of quality, length, and 
style. The fourteen documents which I^^used in this study were selected 
from a set of documents which were keypunched by students in the 



introductory course in Information Storage and Retrieval at Tlie Ohio 
State University. Each student in the class keypunched a document for 
his own investigation into the possibility of natural language processing 
by computer systems. The card decks which I used are duplicates of 
these documents whi'ch the students keypunched. The students .were free 
to select atiy article they desired, sS^there was no control over the 
document selection. I choose fourteen documents that seem to present 
several different features and once the documents werei chosen", I did not 
alter the sample set. (I chose to examine 
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fourteen documents because the weight of fourteen data decks and the 
output from the abstracting system was all I could carry in one trip 
to and from the computer center.) ^ ^ 

The examples cited in this section serve only as an indication of 
the application of the data content criterion. They do not. constitute 
a comprehensive test of- the operation of the abstracting system. These 
examples illustrate certain trends and indicate that large-scale 
testing of the abstracting system (as discussed in Chapter V, SccLion 3.1) 
is warranted. I have included comments on each abstract which reflect 
my personal feelings on the outcome of each example. 

The first example, which is shown in Figure 4.135 is an abstract 
of an article enticled '*Water" from Chemistry . The complete document 
provides a discussion of some of the cortmionly known properties of 
water and of some of the recent research developments. Jhe abstract 
does not indicate this aspect of the original document. The original 
document contains a discussion of a recent discovery, anomalous water, 
which is not even mentioned in the abstract. The sentences of the 
abstract are not related to one another and do not appear to be 
representative of the total content of the article. The last two 
sentences of the abstract do not seem to be of particular significance 
to the total abstract. The data coefficient of* .829 reflects its low, 
quality. The abstract omits a great deal of the significant ideas of 
4:he text and includes some extraneous verbage. 

The second example is an abstract of an article from Datamation 
which is entitled "Automated System". The abstract, which is shown 
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WATER* #CHRISTOP,^c« HAll., CKFMISTRY ^^(8), 6-10 (1971). 1^ AFTER ALL, WATER IS THE 
HOST COHHON LIQUID IN TKE WORLD* ONE EXPLANATION IS THAT, BECAUSE 'MTER IS SO 
WIDELY DISTRIBUTED, IF WE COULD SHELL IT WE WOULD BE SNIFFING IT ALL THE TiHE, 
.AND THAT WOULD BE A GREAT NUISANCE. TO AVOID THIS, OUR OLFACTORY HACHlNERY HAS 
EVOLVED SO THAT IT IGNORES SUCH COHHON THINGS AS WATER. THE ANGLE M-O-rH IS 10^ 
DEGREES AND 27 HINUTES* ALSO, WATER HAS A REHARKABLE CAPACITY FOR DISSOLVING 
INORGANIC SALTS. THE STRENGTH OF AN H BONO IS ONLY ABOUT ONE-TENTH THE STRENGTH 
OF A NORHAL BOND, SUCH AS THAT BETWEEN THE H ANDJ 0 ATOHS IN EACH WATER HOLECULE. 
EVERY 0 ATDH PARTICIPATES IN TWO H BONDS AND EVERY H ATOHS IN UNE. THE 
ARRANGEHENT IS TETRAHEDRAL-EACH 0 ATOH IS SURROUNDED BY FOUR OTHERS AT THE 
CORNERS OF A TETRAHEDRON. TWO OF THESE FOUR BONDS ARE NORHAL H-0 CHEHICAL BONDS 
HOLDING TOGETHER INDIVIDUAL WATER HOLtCULES. THE SITUATION IS REALLY LESS LIKE 
SHAKING THE CAREFULLY PACKED BOX OF CUBE SUGAR THAN LIKE SHAKING A DELICATELY 
BUILT HOUSE OF CARDS. AS THE H faONDS ARE BROKEN BY THE HEAT PUT INTO THE ICE TO 
HELT IT, THE WATER HOLECULES START FALLING INTO THE CAVITIES IN THE STRUCTURE. 
INOEEDt THERE IS EVIDENCE THAT EVEN AT 100 DEGREES C, LIQUID WATER STILL 
CONTAINS SOHE UNBROKEN H BONDS. THEORETICAL HODELS THIS PICTURE OF LIQUID WATER, 
THE COLLAPSED ICE-STRUCTURE HODEL, UNDOUBTEDLY HAS A LOT dF ^TRUTH IN IT. THIS 
NODEL WAS THOU^iHT UP BY J. D. BERNAL AND R. H. FOWLER IN 1933 WHILE THEY WERE 
FOGBOUND AT A HOSCOW AIRPORT. THEY WERE LED TO THE IRC ONC LUS ION BY STUDYING THEIR 
RECENTLY OBTAINED X-RAY DIFFRACTION PATTERN OF WATER WHICH SHOWED STRONG 
RESEHBLANCES TO THAT OF ICE. 



Document 



Abstract 



Total Words 2926 
Data Elements 283 



Total Words 287 
Data Elements 23 
Content .0813 
Length .0981 
DC .829 



Figure 4.13 Computer-produced abstract of "Water 
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in Figure 4»14, represents .205 of the length of the original article* 
The article reports several developments in the field| of system design* 
These developmerftV are surveyed in the abstract* The abstract tends to 
use too many words to express the concepts i presented. This abstract, 
although it has a data coefficient of .926, could be edited 
to make it more concise. If the abstracts produced by th^ system are 
consistently too long but still contain a high fracti^p^^of the data, 
then it might be feasible to include an editing step to follow the 
abstracting operation. 

The third example, shown in Figure 4.15 presents an abstract of an 

* 

article on "Pumped Storage" from the Ohio State Engineer . The Ohio State 
Engineer is a magazine which is produced at Ohio State and has a limited 
distribution. The articles tend to be short and not too technical. The 
article on "Pumped Storage?' is a typical article and is aimed at an 
audience of undergraduate engineers. The abstract for this article 
provides an accurate representation of the content and style of the 
article. The data coefficient is 1.088 and indicates that the aifstract 
is acceptable according to the criterion. 

{The fourth example, shown in Figure 4.16 is an abstract of a 
chapter from B.' A. Trakhtenbrot 's book. Algorithms and Automatic 
Computing Machines . The abstracting system was not de^signed originally 
to abstract chapters from books, but 'this abstract provides an excellent 
representation of the content of this section of the book. The chapter 
deals with the need for and the development of a precise definition of 
the word "algorithm". In reading the abstract, it seems to omit the 



148 



AUTOHATED SYSTFH. iROBERT V, HEAD, DATAMATION 17<16)7 22-24 <1971K» PERHAPS THE 
HOST NOTABLE ACHI EVHMENTJ:' TO DATE HAS BE.EN THE DEVELOPMENT AND USAGE OF SYSTEM 
SIMULATORS LXKE GPSSt SCERT AND CSS TO ASSIST TKE SYSTEM DESIGNER IN CONFIGURING 
TODAY 'S^COMPLEX SYSTEMS. WITHOUT StKH 'SIMULATORS t RELIANCE, ON ANALYTICAL METHODS 
WOULD LEAVE THE SYSTEM DESIGNER MUCff MORE VULNERABLE TO POTENTIALLY DISASTROUS 
^THROUGHPUT ESTIMATING ERRORS* fORHATTED FILE ORGANIZATION. SEVERAL STUDI6S HAVE 
BEEN . CQMDUCTED BY^ IBM FOR THE AIR FOJ^CE WITH TH5: OBJECTIVE OF IMPROVING THE 
DESIGN OF FILES WHICH OPERATE UNDER THE FORMATTED FILE SYSTEM, A DATA MANAGEMENT 
PACKAGE WIDELY USED BX U.S. GOVERNMENT AGENC ICS. tHlS WORK HAS INCLUDED THE 
CONSTRUCTION OF TWO FILE STRUCTURE SIMULATION MODELS TO AID THE SYSTEM DESIGNER 
IN dealing' WITH COMPLEX DATX STRUCTURES. THE FIRST OF THESE^ FOREM It EMBODIED 

.ANALYTICAL TECHNIQUES WHICH, WHILE VERY FAST, EXHIBITED SEVERAL DEFICIENCIES; 
APPLICATION SYSTEM GENERATOR, THERE HAVE BEEN " NUMEROUS EFFORTS 8Y COMPUTER 
HANUFACTU?lERS AND SOFTWARE COMPANIES TO. PERFECT GENERALIZED APPLICATION PACKAGES 
FOR SUCH FUNCTIONS AS ' PAYROLL; ACCOUNTS RECEIVABLE, ^ND INVENTORY CONTROL. 
ESSENTIALLY, THESE PACKAGES HAVE THE OBJECTIVE UJF NOT MER£LV\ EASING THE 
AHALYST«S JOB BUT ACTUALLY ELIMINATING \'IT, AT LEAST IN COMMONLY ENCOUNTEfEO 
APPLICATION AREAS. EXPERIENCE ; HAS SHOWN, THOUGH, THAT WHitE PACKAGES FOR 
APPl^ICATIONS LIKE PAYROLL HAVE BEEN SUCCESSFULLY IMPLPMENTED, A GREAT DEAL OF 
CUSTOM TAILORING BY Af^ALYSTS AND PROGRAMMERS IS OFTEN "REQUIRED. PACKAGES CAN 
HELP "TO FREE THE COMP^ANY SYSTEMS oTAFF FROM Ci3NCERN WITH MUNDANE PROCESSING 
PROBLEMS, BUT, THEY MAKE .LITTLE CONTRIBUTION ^TO IMPROVED SYSTEM DESIGN 
METHODOLOGY. THE. ^APPROACH TO APPLICATIONS TAI»EN BY IBM WITH ITS SMALL-SCALE 

's,YSTEM/3 COMPUTER DOES, HOWEVER IBM HAS MADE AVAILABLE FOR CERTAIN BASIC 
APPLICATIONS, LIKE" PAYROLL', "AN APPMCATIOrj OUSTOMIZER SERVICE. BY > COMPLtT ING A 
DETAILED QUESTIONNAIRE 'FOR EACH APPLICATION, THE USER INDICATES^ WHICH PROCESSING 
METHODS, HE REQUIRES FROK AMONG THOSE AVAILABLE. THtSE FORMS ARE THEN PKOCSSSEO 
AT AN IBM SYSTEMS CENTER' WHICH PRODUCES A SET UF PROGRAMMING AIDS FOR ALL 
PROGRAMS, CARD RECORDS AND DATA FitLOS REOUIRFD BY THE APPLICATION. THE, 
PROGRAMMER THEN USES THESE AS THE BASIS FOR PROGRAM CUDlNGl THIS PROVEOURE 
INVOLVES A PACKAGE IN THE SENSE 'THAT A MODEL SYSTEM OtSiGN I^ MADE AVAILABLE TO 
THE USER. WHILE THE DESIGN LTSELF CANNOT BE CHANGED BY THE USER, NUMEROUS 
OPTIONS ARE AVAILABLE TO HIM, AND THE LEAD TIME FOR SYSTEM DE^cLOPMENT JS 
REDUCED, ' • 



Document 



Total Words 1777 
Data "Elements 116 



Abstract 

Total Wpi;ds 3C»A 
Data ^EMemehts 22 
'Content ] 1897 ^ 
length .2048 
PC .926* 



Figure 4.14 Computer-produced abstract of "Automated Syst'em" 
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PU«PeO STORAGE* #RICHARO FRENCH, OHIO STATE hriGlNe£R,53(3) » JAN»1971»PGc,4,5»24# 
TOPOGRAPHICALLY, THE ROCKY KOUTAiN STATES ARE IDEAL FOR PUHPEO STORAGE; 

• UJ^OOUBTEOLV* THE SUCCESS OF THE CHICAGO DEEP PROJECT KILL AFFECT THIS IDEA. PURE 
PUHPfiO STORAGE IS HYDROLOGICALLY INDEPENDENT; PUMPED STORAGE PLANTS ARE 
^WCHANlCALtt SIMPLE AND VERY RUGGED; PUMPED STORAGE PLANTS ARE ALSO 

. SELf-«30ER«IHNG- WITH THE ADVENT OF MORE EFFICIENT NUCLEAR UNITS AND WITH THE 
OEVEIOPMENT OF CAS TURBlNESr THE COST OF PUMPED STORAGE POWER HILL DRDP; I.E., 
FOR EACH, GAIN IH THERMAL POWER EFFICIENCY THERE IS AN EQUAL GAIN IN PUMPED 
STORAGE EFFICIE^KCY. SIHCB 1926 AND THE ROCKY RIVER F^ANT» PUMPSD STORAGE 
CAPACITY HAS INCREASED BY MORE THAN BOOT. ALONG WITH THIS GREAT UPSURGc HAS COME 
C5ST CUTTING OEVELOPMENTS- 



Documenc 



Abstract 



Total Words 1277 
Data Elements 96 



Total Words 110 
Data Elements 9 
Content ^ .0938 
Lringth .0861 
DC 1.088 



Ftg'jre 4-15 Cojcpucer-produced abstract of "Pumped Storage 
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THE NEED FOR A MORE PRECISE DEFlNlflON OF "ALGORITHM", mb. A. TRAKHTENbROT, 
-ALGORITHMS AND AUTOMATIC COMPUTING MACHINES* 52-57( 1963) AS Wt HAVE ALREADY 
SEEN, THE ACTUAL APPLICATION CF AN ALGORITHM MAY TURN CUT TO 3E VERY LENGTHY, 
AND THE JOB OF RECORDING ALL OF THf INFORMATION INVOLVED MAY BE ENORMOUS. UNTIL 
RECENTLY, THERE WAS KO PRECISE DEFINITION CF THE CONCEPT "ALGORITHM" AND 
THEREFORE THE CONSTRUCTION CF SUCH A OEFINITION CAME TO EE ONE OF THE MAJOR 
PROLBEMS OF MODERN-MATHfcMAlCS. IJ IS VERY IMPORTANT TO POINT OUT THAT THE 
FORMULATION OF A DEFINITION OF "ALGORITHM" MUST BE CONSIDERED NOT MERfcLY AN 
ARBITRARY AGREEMENT AMONG MATHEMATICIANS AS TO WHAT THE MEANING OF THE WORD 
•ALGORITHM" , SHOULD BE. THE DEFINITION HAS TO REFLECT ACCURATELY THE SUBSTANCE 
OF THOSE lOEAS WHICH ARE ACTUALLY HELD, HOWEVER VAGUELY* AND WHICH HAVE "ALREADY^ 
BEEN ILLSUTRATED BY MANY EXAMPLES. WITH THIS AIM, A SERIES OF INVESTIGATIONS 
WAS UNDERTAKEN BEGINNING IN THb 1930*S FOR CHARACTERIZING ALL THE METHODS WHICH 
WERE ACTUALLY USED IN CONSTRUCTING ALGORITHMS. THE PROBLEM WAS TU FORMULATE" A 
DEFINITION OF THE CONCEPT OF ALGORITHM WHICH WOULD BE COMPLETE NUT ONLY IN FORM 
BUT MORE IMPORTANT, IN SUBSTANCE. VARIOUS WORKERS PROCEEDED FROM DIFFERENT 
LOGICAL STARTING POINTS. AND BECAUSE OF THIS, SEVERAL DEFINITIONS WERE 
PROPOSED. IT TURNED OUT THAT ALL OF THESE WERF EQUIVALENT, AND THEY DEFINED THE 
SAME CONCEPT. 



Figure 4.16 Computer-produced abstract of "The Need for a More Precise 



Document 



Abstract 



Total Words 2257 
Data Elements 157 



Total Words 203 
Data Elements 16 
Content .1019 
Length .0907 
DC 1.124 



Definition of 'Algorithm'" 
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definition of "algorithm". This is understandable because, in fact, the 
definition is actually given in a subsequent chapter of the book and 
not in the text being abstracted. The abstract has a data coefficient 
of 1.124 and represents .091 of the length of the original. In this 
specific case, ADAM is shown to have applicability to more than just 
journal articles. 

Ine fifth example, shown in Figure 4.17, is anot'her case where the 

abstracting program provided surprisingly good results. The abstract is 

/ 

of an article entitled "The Clavichord and How to Play It" which appeared 
in Clavier . The abstract provides an accurate representation "of the 
content of the original with sentences .selected from both the description 
of the instrument and the techniques for playing it. The original 
article used several quotations from musicians to elaborate main points 

of the article. All but ooe of these quotations were not included in 

/• 

the abstract. The one quotation that was included expresses "the 
especially reuiarkable features of clavichord music", an idea central to 
thai article. This article had a data coefficient of 1»450, the second 
highest of the set of sample abstracts. 

The sixth example, shown in Figure 4.18 provides" .140 of the 
content in .147 of the length of the original document. The abstract 
provides an accurate representation of the content of t:he article 
"Solving Artwork Generation Problems by Computer" which appeared in 
Electro Technology > This abstract might be improved by editing it to 
make it mote concise. Its data coefficient of .953 reflects its general 
mediocrity. 



ERLC 



152 



THE CLAVICHORD AND HOW TO PLAY IT. #MARGERY HALFOROf CLAVIER 9(2) t 38-41 
Cl970).ri ESSENTIALLY, THE CLAVICHORD IS A SHALLOW RECTANGULAR BOX WHDSE FRAGILE 
STRlNGSf UNDER XIGHT TENSION, ARE STRUNG HDR4 ZDNTALL'Y FROM A SINGL6 BRIDGE OVER 
A THIN SOUNDBOARD. THE KEYS ARb SIMPLE LEVERS WITH A BRASS BLADS CALLED A 
TANGENT MOUNTED VERTICALLY ON THE FAR END. THE SOUND PRODUCED IS fcXTRAORD INARI LY 
RICH IN OVERTONES. THE TONE OF THE CLAVICHORD DOES NOT fcXiST READY-MADE AS IT 
DOES Of4 THE PI AND AND HARPSICHORD; IT IS FDRMEO AND SHAPED BY THE FINGER, AS ON 
A ; BOWED STRINGED INSTRUMENT, WITH THE RESULT BEING A GENUINE, DIRECT, LIVING 
■FEEL OF THE STRING* • AS LONG AS HIS FINGER REMAINS IN CONTACT WITH THE KEY, 
TME PLAYER RETAINS CONTROL OF THt SOUND. JHE CLAVICHORD IS THE LEAST MECHANIZED 
AND THE MOST RESPONSIVE OF ALL KEYBOARD INSTRUMENTS IN THAT IT^?MEETS THE PLAYER 
HALFWAY IN TTS INSTANT AND FAITHFUL TRANSMISSION OF H IS f LIGHTEST MUS ICAL 
INTENTIONS. EMBELLISHMENTS CAN BE PLAYED CRISPLY AND BRILLIANTLY. SHAKES, SNAPS« 
APPOGGIATURAS, TRILLS, TURNS, MORDENTS, AND SLIDES — ALL SO CHARACTERISTIC OF 
THE PERIOD WHEN THE CLAVICHORD ENJOYED ITS GREATEST POPULARITY — ARE IDEALLY 
SUITED TO THE i.SlSTRUMENT • S EXQUISITE CLARITY AND RICHNESS OF^ TONE. THE ACTION IS 
SHALLOW AND VIRTUALLY WEIGHTLESS. IT IS A PHENOMENON OF THE DOUBLE-ENDED LEVER 
THAT THE TONE PRODUCED BY A- STRIKING FORCE WILL SOUND BETTER, SWEETER, AND 
RICHER AT MAXIMUM LEVER LENGTH* FDR THIS REASON, THE KEYS OF THE CLAVICHORD ARE 
PLAYED AS NEAR TO THE FRONT EDGES AS POSSIBLE. EXCEPT FOR* THE PLAYING OF 
OCTAVES, THE THUMB IS NEVER USED ON A RAISED KEY; DISPLAY PIECES CF A VIRTUOSO 
CHARACTER ARE GENERALLY UNSUITED TO THE PERSONAL QUALITIES OF THE CLAVICHORD. 
CRAMER SAYS THAT THE ESPECIALLY REMARKABLE FEATURES OF CLAVICHORD MUSIC ARE 
•FLUIDITY, SUSTAINED MELODY DIFFUSED WITH EVER-VARYlNG LIGHT AND SHADOW, THE USE 
OF CERTAIN MUSICAL SHADING AND ALMOST COMPLETE ABSTINENCE FROM PASSAGES WITH 
ARPEGGIOS, LEAPS, AND BROKEN CHORDS; 
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SOLVING ARTWORK GENERATION PROBLEMS BY COMPUTER. f»KENNETH SANDERSON, ELECTRO 
TECHNOLOGY 84(5), 63-65 (1969|).# COMPUTER-AIDED DESIGN TECHNIQUES NOW CAN 
PRODUCE AUTOMATICALLY MASK-ARTWORK FROM PUNCHED-CARD INPUTS WITH UP TO 3-1 
SAVINGS IN TIME AND COST. CHIP ARTWORK IS PRODUCED AT 10 X. THE 10 X ChIp 
ARTWORK THUS GENERATED IS DIRECTLY MOUNTED IN A STEP-AND-REREAT CAMERA TO 
PRODUCE A IX PLATS OF A WAFER LEVEL OF MANY CHIPS. THERE ARE SEVERAL METHODS OF 
ENTERING THE GEOMETRJC DESCRIPTION OF A MASK INTO THE COMPUTER. ONE METHOD IS TO 
DESCRIBE THE GEOMETRJES WITH THE AID OF A DESIGN LANGUAGE PUNCHED ON CARDS. A 
SECOND METHOD IS TO ENTER DATA BY MEANS OF AN INTERACTIVE GRAPHICAL DISPLAY 
I'^^I^c; .^^^^ METHOD HAS ADVANTAGES AS WELL AS LIMITATIONS. DATA ENTRY VIA 
PUNCHED CARDS ALLOWS MORE ENGINEERS TO USE THE SAME SYSTEM IN THE SAME PERIOD OF 
TIME. IT IS ALSO TRUE THAT THE DEFINITION OF SPECIFIC DIMENSIONS IS .-^QRE EASILY 
ACCOMPLISHED WHEN NUMBERS ARE PUNCHED INTO CARDS, INSTEAD OF ADJUSTING A LIGHT 
PEN TRACKING SYMBOL TO A SPECIFIC COORDINATE. A FURTHER LIMITATION TD CRT SCREEN 
coJ?Tcv^V.I''^^c^?cr°^ SPECIAL DEVICES OR ADAPTATIONS TO ENABLE THE ENGINEER TO 
^n^^irlrcr^^.c^^^^^^ DIMENSIONS THAT HE REQUIRES. TO RETAIN THE SPECIAL 
t.lt l^l^..^^ METHODS AND TO AVOID THE RFSTRICTING LIMITATIONS, A COflMON 

DATA BASE IS USED FOR INTERNALLY DESCRIBING THE GEOMETRIES OF MASK- ARTWORK. THIS 
A'^K^GENER^T^oTSis^E^ C APABIL ITY/PLEX IBI LI TY OF THE COMPUTER AIDED 
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The seventh example^ is the abstract of '^Magnetic Bubbles" shown in 
Figure 4.19. The article appeared in Scientific American and contained 
5068 total words to make it the longest of the sample documents. The 
sentences selected for the abstract reflect the content of the entire 
article, The abstract, which is .187 of the length of the original, is 
too long to be written as a single paragraph. The- abstract could be 
improved by either editing it or adding procedures to enable the 
abstracting system to paragraph the output. The data coefficient value 
of 1.136 reflects the fact that although the abstract is long, it 
contains .212 of the data as I have measured It. 

The eighth example, shown in Figure 4.20, is the abstract of 
"Educational Decision Making" which appeared in Today's Education . The 
title is essential to the preservation of the intended meaning of the 
document in the abstract. The first several sentences in' the abstract 
do not clearly point out that the type of decision making being discussed 
^refers to the nation's educational system. The abstract, although not 
in error, is not as clearly stated as it might be. The data coefficient 
of .958 reflects the need to improve this abstract somewhat. 

The ninth example is taken from Psychology Today and appears in 
Figure 4.21. The article, entitled "Families can be Unhealthy for 
Children and Other Living Things", presents a strong theme ^ith 
supporting evidence. The theme of the article, can -be best summajrized ^ 
by the sentence which appears first in the abstract, "The myth of the - 
famiiy blinds us to the dangers of our normal child-care practices"* 
This sentence indicates the strong opinions of the author. The article 
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MAGNETIC BUBBLES* iANOREW H* BOBECKt H« E* 0* SCOVlLt SCIENTIFIC AMERICAN 
22<^C6)t7B*90(JUNEfI97I)*i NOW LET US LOOK AT A THIN WAFER CUT FRON A SPECIALLY 
SVNTHESUeO SINGLE CRYSTAL DF MAGNETICALLY ANISOTROPIC MATERIAL. WHEN THE WAFER 
IS VIEWEO BY POLARIZED LIGHT ONE *SEES A PATTERN OF WAVY STRIPS REPRESENTING 
DOMAINS* IN HALF OF THE STRIPS THE TlNY INTERNAL MAGNETS POINT UP, IN THE OTHER 
HALF THEY POINT OOUN* DEPENDING ON THE ORIENTATION DF THE POLARIZING FILTERt 
ONE SET DF THE STRIPS WILL LOOK BRIGHT ANO THE OTHER DARK* THE TWO SETS OF 
STRIPS OCCUPY EQUAL AREAS* NEXT LET US InNERSE THE WAFER IN AN EXTERNAL 
MAGNETIC FIELD PERPENDICULAR TO THE WAFER AND OBSERVE WHAT HAPPENS WHEN WE 
SLOWLY INCREASE THE STRENGTH DF THE FIELD* AS THE STRENGTH IS RAISED THE WAVY 
STRIPS WHOSE MAGNETNETIZATlDN IS OPPOSED BY THE FIELD BEGIN TO GET 

NARROWER* THE PROCESS CONTINUES INTO THE SMALL CIRCLES WE CALL BUBBLES* THE 
BUBBLES ARE ACTUALLY CYLINDERS SESN END ON. RA ISlNG JHE EXTERNAL FIELD 

STILL FURTHER CAUSES THE BUBBLES TO SHRINK UNTIL FINALLY THEY OISAPPEAR 

ALTOGETHER. DEPENDING ON THE MATERlALt THE BUBBLES HAVE A DIAMETER RANGING 
FROM A FEW MICRONS TO SEVERAL^ HUNDRED. THEY ARE STABLE OVER A THREE-TO-ONE 
RANGE IN DIAMETER* EACH BUBBLE ACTS LIKE A TINY KAGNET AFLOAT IN THE SEA OF A 
MAGNETIC FIELD OF OPPOSITE POLARITY* THE EXTREME MOBILITY OF THE BUBBLE CAN BE 
DEMONSTRATED BY MOVING A FINE MAGNETIZEO WIRE ACCROSS THE SURFACE OF THE 

WAFER WHILE OBSERVING THROUGH THE MICROSCOPE HOW THE BUBBLES CAN BE PUSHED 
EFFORTLESSLY IN ANY DIRECTION* AT THE SAME TIMF THE BUBBLES REPEL ONE ANOTHER 
ANO MAINTAIN A FAIRLY UNIFORM SPACING BECAUSE THEY ARE ALL SIMILARLY 
POLARIZEO* ONE CAN SHOW THAT IF THE EXTfcRNAL FIFLO THAT PRODUCES THE BUBBLES 
IS HELO CONSTANT WITHIN A RANGE OF PLUS OR MINUS TWENTY PERCENT,. THE BUBBLES ARE 
COMPLETELY STABLE ANO CAN BE MOVED ABOUT INOEF INI TELY. THUS WE HAVE 

OUPLICATEO ON A MICROSCOPIC SCALE OBJECTS AS OURABLE ANO AS IMPENETRABLE AS 
BILLIARO BALLS, WITH THE AOOED ADVANTAGE THAT THEY REPEL ONE ANOTHER* MOREOVSRt 
BUBBLES CAN BE CREATED ANYWHERE THEY ARE DESIREOt ANO THEY CAN BE OESTROYEO BY 
TECHNIQUES WE SHALL DESCRIBE BELOW* THE FIRST MAGNETIC MATERIALS FOUND TO HAVE 
THE DESIRED PROPERTIES FOR STUDYING THE NEW BUBBLE TECHNOLOGY WERF 

ORTHOFERRITESf A SPECIAL CLASS OF FERRITES WITH THE CHEMICAL FORMULA RFE03f 
WHERE R REPRESENTS YTTRIUM OR ONE OR MORE REARE-EARTH ELEMENTS. OTHER 
CRYSTAL-GROWING METHOOS HAVE ALSO BEEN STUOIEOt ANO RECENTLY GOOD WAFERS HAVE 
BEEN CUT FROM SINGLE-CRYSTAL RODS PULLEO OIRECTLY FROM THE MELT. IN THE MOST 
SATISFACTORY GARNET SAMPLES THE BUBBLE DIAMETER IS ABOUT THREE MICRONSt WHICH 
ALLOWS THE PACKING IN OF A MILLION BUBBLES PER SQUARE. INCH* BY WRAPPING THE ^ 
WIRE WITH DRIVING COILS IT WOULD BE POSSIBLE TO MOVE THE MAGNETIC SLUGS 
THROUGH THE WIRE AT HIGH SPEEOSt MUCH AS OIL IS PUMPEO THROUGH A PIPELINE* *ONE ~ 
IMPORTANT DRAWBACK WAS THAT NO PRACTICAL WAY COULO BE FOUND TO MOVE SLUGS 
BETWEEN WIRE? EXCEPT BY READING A SLUG OUT OF ONE WIRE ANO WRITING IT INTO 
ANOTH£Rf THE OOMAIN WALLS AT EACH END DF A MAGNETIC SLUG DO NOT CUT THROUGH THE 
WIRE AT RIGHT ANGLES BUT EXTENO FORE AND AFT IN THE SHAPE OF TWO LONG CONES. 
CONTROL REQUIRES THE CREATION OF MAGNETIC ORIVING FIELOS, MAGNETIC FIELDS WITH 
COMPONENTS- IN-THEPLANE OF THE WAFER. TWO GENERAL METHOOS ARE AVAILABLE* THE 
rlRST METHOD IS CALLEO CONOUCTOR ACCESS* THE SECOND METHODt CALLED FIELD 
ACCESSf INVOLVES IMMERSING THE ENTIRE WAFER IN EITHER A PULSATING OR A ROTATING 
MAG«NETIC FIELD THAT ACTS ON THE BUBBLES BY MEANS DF CAREFULLY PLACED SPOTS OF 
MAGNETIC MATERIAL THAT CONCENTRATE THE FIELO* IF AN OROINARY BINARY COOE IS 
USEOf A BUBBLE STANOS FOR ONE ANO THE ABSENCE OF A BUBBLE STANOS FOR ZERO* 

BUBBLES CAN READILY BE MOVEO IN TWO OIMENSIONS BY AOOING A SECONO SET OF LOOPS 
AT RIGHT ANCLES TO THE FIRST. THE TROUBLE WITH CONDUCTOR METHOOS IS THAT A 

CREAT MANY ACCURATELY fLACEO CONDUCTORS WHOSE OIMENSIONS ARE COMPARABLE TO TH€ 
SIZE OF BU«BL£S MUST BE INTERCONNECTED WITH EXTERNAL-ACCESS CIRCUITS. THIS 

MOBLEK IS GREATLY SINPLIFIEO BY THE F lELD-ACCESS .APPROACH. ONE PULO-ACCESS 
MtTHOD INVOLVES RHYTHMICALLY RAISING ANO LOWERING THE OVERALL MAGNETIC 

BIAS ON THE WAFER SO THAT BUBBLES ALTERNATELY SHRINK ANO EXPAND* UNDER THE 

INFLUENCE OF THE ROTATING MAGNETIC FIELD A NEW BUBBLE WILL BE PROOUCEO FOR .EVERY 
COMPLETE REVOLUTION DF THE FIELD, AND IT WILL BE PROPAGATED TO THE RIGHT*. 

IT IS ALSO A SIMPLE MATTER TO DIVIDE ANY STREAM DF BUBBLES BY TWO BY CREATING A 
LITTLE TRAPf OR BYPASSt THAT SHUNTS EVERY OTHER BUBBLE TO ONE SIDE AND REMOVES 
IT FROM THE MAINSTRAM. ELECTRONIC TELEPHONE-SWITCHING SYSTEMS CLOSELY 
RESEMBLE LARGE DIGITAL COMPUTERS* REPLICATION, WHICH IN A CONVENTIONAL 

SEMICONDUCTOR LOGIC SYSTEM IS CALLEO "FANOUT,* IS THE DUPLICATION OF EXISTING 
BINARY STAfES* SINCE BINARY DATA ARE GENERALLY CONSUMED WITHIN CALCULATION 
CENTERSt THE ABILITY TO RgPLlCATE THE DATA FOR FUTURE MANIPULATIONS IS 
ESSENTIAL. TO MAKE REPLICATION POSSIBLE IN OUR BILLIARD-BALL MODEL THE FIRST 
MODIFICATION REQUIRED IS TO OIVIDE THE BASIC (CELL INTO TWO CELLS, ONE ABOVE THE 
OTHER, states" CORRESPONO ING TO ONE OR ZERO ARE SYMMETRICAL ANO ARE OEFINEO" 

arbitrally. wf have been most successful with field access by rotating an 
in-plane fielo with a structure consisting of •t»s» ano bars or s imilar. shapes* 
to enter data we selectively transfer bubbles from a bubble reservdirt which is 
in reality a minor loop equipped with a bubble-generator at one end and a 
bubble-Eater at the other* * we anticipate that magnetic bubbles will provide 
large -capacity information storage of high 'reliability at very low cost* 
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EDUCATIONAL DECISION HAKING. ^IfWlLLlAH L. PHARlSf JOHN C. WALOPN, LLDYD E. 
ROBISCNf TODAY'S EDUCATION 58(7 ) 1 52-54 ( 196V) • H WPOPLt: WHO TRADITIONALLY HAVt: 
HAD LITTLE OR NO VOlCh IN DECISION HAKING ARE UN60U I VOCALLY STATING, "WE WILL Bh 
INCLUDED. SOHE CRITICS WOULD HODIFY AND OTHFRS WOULD DESTROY 'THE PRESENT 
STRUCTURE AS A PRELUDE TO CREATING A NEW FRAHEWORK. SSSENTlALLYt THIS MONSTER- 
HAS TWO heads: ThE LEGAL SYSTEM CCNSISiJlNG OF THF FORMAL GOVERNMENTAL BODIES AND 
OFFICIALS AT FEDERAL, STATE, AND LOCAL LEVELS WHO EXERCISE CONSTITUTIONAL, 
STATUfORY, ANO JUDICAL AUTHORITY IN RtG^RD TO EDUCATION AND THfc EXTRALEGAL 
NETWORK COMPOSED OF THOSE PERSONS, GROUPS, AND ORGANIZATIONS THAI ARE fiOT PART 
OF THE FORMAL, LEGAL STRUCTURE BUT THAT DO INFLUENCE ITS DEC I S lUN-MAK ING 
PROCESS. THE TWO ARE INTERDEPENDENT ANO CONSTANTLY INTERACTIVE. PROPONENTS OF 
SPECIFIC AID PROGRAMS MAINTAIN THAT SUCH ASSISTANCE WILL NOT ObSTROY STATE ANO 
LOCAL AUTHORITY. THEY ARGUE THAT STATE ANO LOCAL OFFICIALS CAN REFUSE FEDERAL. 
AIO IF THEY SO DESIRE ANO, MOREOVER, THERE ARF- OPTIONS WITHIN ANY FEDERAL AID 
PROGRAM WHICH PERMIT LOCAL Oi=FIClALS TO MAINTAIN. THE INTEGRITY OF THEIR OWN 
EDUCATIONAL PROGRAMS. THE SCHOOL 60ARD MUST TAKE THE FINALt FDRM.AL ACTION TO 
LEGALIZE A DECISION, BUT IN DETERMINING THE .fROCESS BY WHICH DECISIONS WILL BE 
REACHED, THE BOARD MAY CONSULT WITH OTHER GROUPS OR- EVEN VOLUNTARILY ENTER INTO 
COLLECTIVE BARGINING AGREEMENTS WITH OTHER GROUPS. IN NO WAY DOES A BOARD Of 
EDUCATION ABROGATE ITS STATUTORY RESPONSIBILITY bY PROVIDING A MEANS THAT ENABLE 
OTH^R GROUPS TO HAVE A VOICE IN MAKING DECISIONS. THE SCHOOL BOARD, RECOGNIZING 
THAT SELECTION OF STAFF IS A PROFESSIONAL OECISICNt COULD THEN RATIFY THE 
TEACHEFTS* SELECTIONS. IN SUM, THE MOST IMPORTANT -FACTOR AMONG A NUMBER OF THOSF 
AFFECTING THE LEGAL STRUCTURE FUR ObCISION MAKING WILL BE THE POLITICAL 
DECISIONS MAOE BY THE AMERICAN BODY POLITIC DURjtNG THE 1970*S. PROFESSIONAL 
EDUCATORS .WILL PLAY A ROLE IN MAKING THOSE POLITICAL DECISIONS, AND IN SO DClNb 
THEY MUST ADDRESS THENSELVES JO FUNDAMENTAL QUESTIONS REGARDING THE FORMAL 
DECISION-MAKING PROCESS. FOR LACK OF A BETTER TERM, THIS HUbE GROUP MIGHT BE 
CALLED THE "GENERAL PUBLIC" AND THEIR INTERESTSt THE "PUBLIC INTEREST." AS 
HERRING OBSERVED IN THE MIDST OF THE CRISIS OF THE GREAT DEPRESSION, THE CLASH 
OF COMPETING INTEREST GROUPS DOES NOT NECESSARILY GUARENTEE THAT THE PUBLIC 
INTEREST, EVEN IF IT CAN BE DEFINED, WILL BE PROTECTED. THEREFORE, WHILE 
AMERICANS STRUGGLE WITH THE PROBLEMS OF AOMITJING NEW GROUPS INTL THE 
DECISION-MAKING PROCESS, THEY MUST ALSO GUARD AGAINST SACRIFICING THOSE 
INTERESTS WHICH WILL BEST SERVE THE NATION AS A WHOLE. THE LQCUS OF SOCIETAL 
DECISION-MAKING AUTHORITY WILL SHIFT FURTHER AWAY FROM LOCAL SCHOOL DISTRICT 
LEVELS TO STATE CAPITALS ANO WASHINGTON, D.C. 
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FAMILIES CAN BE UNHEALTHY, FOR CHILDREN AND OTHER LIVING THINGS, #ARLENb 
SKOLNICK, PSYCHOLOGY- TODAY 5(3)f 18-22,104-106 AUG. 1971 THE MYTH OF THE FAMILY 

Blinds us to dangers of our normal child-care f»«<ACTiCES. there is very little to 

PREVENT THE PARENT, IF HE IS SO INCLINED, FROM ACTING IN AN IRRATIONAL, UNFAIR, 
CRUEL, ABUSIVE DR SIMPLY NEGLECTFUL MANNbR TOWARD HIS CHILD* INDEED THE PARENT 
IS LEGALLY EMPOWERED TO USE CORPURAL PUNISHMENT TO ENFORCE HIS RULES, NO^MATTER 
HOW ARBITRARY THEY MAY APPEAR TO THE CHILO'OR TO OTHERS. IF THEPARENt' SHOULD 
KILL HIS CHILD IN THE COURSE OF ADMINISTERING A •^DERSER VED** BEATING, SOME STATES 
WOULD CONSIDER THE EVBNT AN EXCUSABLE HOMICIDE. AT THIS POINT IT MAY SEEM THAT 
THE ARGUMENT HAS SLIPPED FROM A DISCUSSION OF PARENTS IN- GENERAL TD AN EXTREME 
KIND OF BEHAVIOR — CHILD ABUSE AND MURDER. THIS BLURRING OF LINES IS INTENTIONAL. 
THE LITERATURE ON BATTEftED CHlLDRtN YIELDS' ONh MAJOR CONCLUSION: THERB' IS NO 
CLEAR LINES OF DEMARCATION BETWEEN BATTERING PARENTS AND "NCRMAL" ONES. NOTHING 
SETS THESE PARENTS OFF AS A (^ROUP IN TERMS OF SOCIAL CLASS, OCCUPATION, I.Q., 
URBAN-RURAL RESIDENCE, tJR P SYCHOPATKULOGY. ALL THAT RESEARCH HAS FO UND IS A 
PATTERN OF CHILD REARING THAT IS MERELY AN EXAGGERATION OF THE, USUAL ONE. 
BATTERING PARENTS EXPECT STRICT OBEOIENCt FROM VERY YOUNG CHILDREN; THEY HAVE A 
CURIOUS SENSE OF RIGHTEOUSNESS; THtY FEEL THAT THEY ARE TEACHING THEIR CHlLDkfcN 
NOT TO BE SPDILED AND D ISRESPtCTFUL . 
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is designed to challenge popularly held views and the abstract reflects 
this tone. A person writing an abstract of this article would probably 
be tempted, to make the abstract more objeccive in order to avojLd the 
possibility of making unsupported statements. This tendency would not 
be 'desirable because the abstract would, not convey the often inflammatory 
tone of the original article. Here the consistency and objectivity 
of the computer based abstracting system allows the subjectivity of the 
author to emerge. The data coefficient of 1.013 indicates 'that the 
abstract is an adequate replacement for the original in a condensed 
version. 

The tenth sample abstract, shown in Figure 4^.22 has the lowest data 
coefficient of the set of sample abstracts. The abstract tends to 
present disjoint ideas with no continuity between sentences. Intuitively, 
this abstract ranks lowest in the sampje set, also. The'^ original 
document, entitled "Automating Medical Records**, appeared in the 
De laware Medical Journal andjifhs designee' oo acquaint physicians with 
recent developments in automation of hospital record keeping. The 
article presents a superficial survey of several developments and does 
not have a cohesive theme. The concluding sentence of the original, 
**It is important that all are aware of the developments in this- field 
now.**, indicates the intent of the article to publicize several new 
trends without actually reporting the details of these developments. 
The structure .and style of the original document appears to be the 
significant limiting f^cjtor in the production of the abstract. 
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AUTOMATING MEDICAL RECORDS. iCAPT. CHARLES S. BURGERt DELAWARE MEOICAL JOURNAL 
43 C 5), 127-129 (MAY, 1971). i TWO DEVELOPMENTS IN COMPUTER TECHNOLOGY HAVE ALSO 
HASTENEO DEVELOPMINT OF A COMPUTERIZEO RECORD. THE FIRST IS THE TOUCH-SENSITIVE 
CATHODE RAY TUBE INPUT OEVICE WHICH ALLOWS VERY RAPIO INPUT OF MEDICAL DATA. THE 
SECOND IS THE DEVELOPMENT OF ^lOH LEVEL** COMPUTER LANGUAGES. TO ALTER THE 
MEDICAL CONTENT OF THE RECORD SYSTEM. THIS CAN BE LEARNED BY PHYSICIANS WITH 
MEAT EASE WITHOUT HAVING TO KNOW THE DETAILS OF COMPUTER " OPERATIONS. " UTILI ZING 
THE FANTASTIC MEMORY AND RECALL CAPABILITY OF THE COMPUTER, WE c'an' HOPEFULLY 
ELIMINATE ROTE MEMORY FROM OUX MEDICAL EDUCATION SYSTEM. THE CAPABILITY OF 
OBTAINING MULTIPLE PRINT-OUTS OF ALL OR PART OF THE MEOICAL RECORD COUPLED WITH 
WlOBtEM-ORIENTED ORGANIZATION OF THE DATA WILL ALLOW AUDIT OF PHYSICIAN 
PERFORMANCE FOR THE PURPOSE OF DEFIN2NG EDUCATIONAL NEEDS AND -FOR MONITORING 
QUALITY €F PATIENT CARE. 
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The eleventh example, "Distribution of Attained Service in Time- 
Shared Systems", which is shown in Figure 4.23, is a technical article 
whicfC^resents a detailed discussion of queuing, theory. The original 
article appeared in the Journ^al^of Computer and System Sciences and 
presented several equations, o In the published version the eqvtations 
included greek letters, superscripts,^ subs'cripts *and proofs for the 
equations. The input to the abstracting system was limited to the 
characters on the keypunch so the input of equations resulted in a loss 
of some of the data from the original article. The input of graphical > 
material and special characters will certainly be a possibility with 
the development of advanced optical character recognition devices* 
The abstract has a data coefficient of 1.064.,. 

The twelfth example has a data coefficient of 1.500 which is the 
highest of the sample set.^ This means that -this abstract provides the 
best indication of content in the least . nt of length with respect 
to its own document. The data coefficient does not measure the quality 
between a set of abstracts from different documents. The data 
coefficient can be used to indicate the best '^abstract amDng a set of 
abstracts that all represent the same original document. 

The twelfth example, shown in Figure 4.24, comes from an unexpected 
source. Hot Rod Magazine and is entitled, "Auto Shop Series: 
Carburetion" . The original article contains. 2338 total words and the 
abstract contains only 25 words, or .011 of that number. The 25 words 
in the abstract appear in one sentence which was the last sentence in 
the original article. This sentence is really the autho*-'s own summary 
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OISTRIOUTIOM OF ATTAINED SERVICE IN TIME-SHARED SYSTEMS. #L. KLEINROCK, JOURNAL 
OF COMPUTER AND SYSTEM SClENCES:U287-2^8< 1967) # THE METHODS DF/QUEUEING THEORY 
HAVE BEEN APPLIED TO A NUMBER OF SUCH MODELS TO OBTAIN -VARjlDUS MEASURES OF 
ftRFDRMANCE. IN THIS PAPER, WE CONSIDER. A LARGE CLASS QF SUCH FEEDBACK QUEUEING 
SYSTEMS AHO 0 BTAIN» FDR ALL OF THESE SYSTEMSt A RESULT WHlJCH DESCRIBES THE 
OISTRIBUTIOH OF ATTAINED^SERVICE IN TERMS OF THE PREVIOUSLY SOLVEp PERFORMANCE 
MEASURES. THE FOREGROUNDrBACKGRDUND SYSTEM IS ANOTHER EXAMPLE OF A MEKBER OF OUR 
CLASS. IN THi:> SYSTEM, A NEW ARRiyAL JOINS THE FIRST QUEUE, OBTAINS A QUANTUM OF 
SERVICE, AND THEN, IF MOR£ SERVICE IS REQUIRE D, JOINS A SECONb QUEUE, ETC., 
JOINING THE NTH QUEUE ON HIS NTH VISIT TO THE SYSTEM OF QUbUES. THE SE^YER 
ALWAYS tGIVES SERVICE TO THE LOWEST NUMBERED QUEUE AND PROCEEDS TO THE NTH QUEUE 
OMLY IF THE N-lST , ETC. QUEUES ARE EMPTY. THE SINGLE MOST SIGNIFICANT 
PERFORMANCE FACTOR OF ANY QUEUEING SYSTEM IS THE AVERAGE TIME THAT A CUSTOMER 
SPENDS WAITING IN QUEUES AS HE PASSES THROUGH THE SYSTEM. THE SERVICE FACILITY 
IN SUCH A CASE IS CONSTANTLY CYCLING AMONG DIFFERENT CUSTOMERS IN A CONTINUOUS 
WAY. IN A REAL SENSE, THEN, ALL C.USTOMERS PRESENT IN THE SYSTEM ARE USING A 
'FRACTION OF THE SERVICE CAPACITY ON A .FULL-TIME BASIS. INDEED, THE FRACTION OF 
THE MACHINE BEING USED BY A fUSTOMER FROM THE PTH PRIORITY GROUP AT SOME TIME T 
WHO HAS AN ATTAINED SERVfCE IH THE INTERVAL(TAU,TAU*OT) IS IJERELY 
a(TAU)/GAMMMAN(S)G<S) WHERE G(TAU)=LIM AS^O GOES TO. 0,OG G(PNCQ) ) . SUCH AN 
OPERATING PROCEEDURE MAY BE . REFERED TO AS A PROCESSOR-SHARED SYSTEM AND A 
DISCUSSION OF ITS BEHAVIOR MAY BE FOUNp JACM 14. tHE USEFULNESS OF THIS LIMIT OF 
PROCESSOR SHARING LIES IN ITS REPRESENTATION OF AN IDEALUEO SHARING OPERATION 
IN WHICH SWAP, TIME IS ASSUMED TO BE ZERO. THI5 ASSUMPTION OF SWAP TIME IS AN 
IMPORTANT SIMPLIFICATION IN THESE 'MODELS. THE RESULTS THUS OBTAINED ARE 
lOEALIZECr IN THE SENSE THAT NONZERO SWAP TIME CAN ONLY DEGRADE THE PERFORMANCE 
OF SUCH A SYSTEM. MODELS WITH kONZERO SWAP TIME HAVE BEEN CONSIDERED IN THE-- 
LITERATURE. THE RESULTS OF THIS PAPER GIVE GENERAL EXPRESSIONS FOR THE 
OISTRIBUTION OF ATTAINED SERVICE FOR ANY MEMBER OF A WIDE CLASS. OF TIME-SHARED 
SYSTEMS, INCLUDING THSE WITH PRIORITY INPUTS. THE ANSWERS ARE GOOD FOR FINITE 
SERVICE QUANTA AS WELL AS FOR SERVICE QUANTA APPROACHING ZERO. 
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'Figure 4.23 Computer-produceci 'abstract of "Distribution of Attained 
. * Service in Time-SRared Systems" 
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AUTO SHOP S6RIES: CARBURETION* #0R* DEAN HILL, HOT ROD MA(?AIINE 24(10), 90-93 
(mU.i FOR THE CASUAL, BUT INTERESTED, -LET'S MAKE IT RUN B&TTER- PERSON, 
CC-^SVANT MAINTENANCE, CARE AND ADJUSTMENT ARE NECESSARY FOR MAXIMUM RESULTS FROM 
ANY CARBURET ION SYSTEM* ^ 



Document 

Total Words 2338 
Data Elements 187 



Abstract 

Total Words 25 
Data Elements 3 
Content .0160 
Length .0107 
DC 1.500 



Figure 4.24 Computer-produced abstract of "Auto Shop Series: Carburetion" 
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of Che arclcle and probably th,e best single "sentence of the entire 
article co express the main ideas. Jhe sentence ref-lects the style and 
inforiai tone of the entire article. It is not traditional to consider 
-abscsaccs of articles from such sources as Hot Rod Magazine , but it is 
clearly possible to apply the abstracting system in diverse' areas and " 
achieve exceiletic results. 

The thirteenth exalnple, which is from an article which appeared in 
Eorcune. is shown in Figure^ 4.25. In the article, "A Computer Version 
laf/Kow a City Works", the author, John F- Ka in ( writes a critical 
review of che work of Jay W. Forrester. The abstract reflects Kain's 
ODinions yhich are positive on some aspects ^of Forrester's work and 
negative on ochcrs. The' paragraph is coherent and provides a good 
reprfisencacton'of the'original. It has a data coefficienc of 1.202 
tor"U73- of rhe, content 'in M of t^e length of the original.* 

. The final exansple is shown in Figure 4.26 and is an article from 
it:'tW,^ftc American . The aFticle on "The Origins of Hypodermic 
Kisicacion'Mprovides a surv&y with anacdotes, (Rotations', and illustra- 
C-Lons o£ the liiscory cf this nsedical practice.' The original article 
concained dZBl total words .and the abstract represents .Obi of that 
lenech. The abscracr is short enough to be incluaed in a single 
par'asraph bus it aoes not provide a cotnVietely adequate reflection of 
rhe document's content. T5ie style in which che original article was 
written seems co the limiuing factor in ciie application of the 
abscr acting syste-^ij. 
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A COMPUTER VERSION OF HUH A CITY WORKS, #JOHn F. KAIN, FORTUNt fa0(6), 241-2-'i2 
(I969>.i FORRESTER* A PROFESSOR Al M^I.T.'S SLOAN SCHOOL OF MANAGEMENT, RELIES 
ON A COMPUTER MODEL HE DEVfcLOPtb TO SIMULATE THE GROWTH, DECLINE^ AND STAGSATiaS 
OF A HYPOTHETICAL CITY PROM BIRTH TD OLD AGE (250 YEARS). SUCH METHODS HAVE A 
GREAT DEAL OF POTENTIAL FO^l THE ANALYSIS OF URBAN PROBLEMS At^D HAVE ALREADY 
DEMONSTRATED THEIR VALUE IN A NUMBER OF SPECIFIC, THOUGH LIMITED APPLICATIONS. 
HOWEVEP, THE DEVELOPMENT OF TRULY USEFUL AND TRUSTWORTHY URBAN SIMULATION MODELS 
REMAINS A DISTANT OBJECTIVE AND WILL REQUIRE MUCH GREATER RESOURCES THAN HAVfc 
YET BEEN DEVOTED TO THE TASK. BEFORE AOEOUATE MODELS BECOME AVAILABLE, MANY 
INADEQUATE ONES WILL BE PUT FORWARD. FORRESTER'S MODEL IS A CONSklCUOUS EXAMPLE. 
IN HIS FIRST CHAPTER FORRESTER WARNS THE READER THAT CAUTION SHOULD 6E EXERCISED 
IH APPLYING THE MODEL TO ACTUAu SITUATIONS. SUBSEQUENTLY, HOWEVWit Hfc EXPRESSES 
FEW RESERVATIONS ABOUT THE MODEL'S VALIDITY AND FRFELY USES IT AS A BASIS FUK 
PRESCRIBING PUBLIC POLICY* THE INFLUENCE OF TAX RATES UN^ fcHPLUYMENT AND 
POPULATION STRUCTURE IN FORRESTER'S CITY IS POWERFUL AND PERVASIVE. 
•MANAGERIAL-PROFESSIONAL" AND -LABOR" FAMILIES ARE ASSUMED TO HE REP3LLED BY 
HIGH TAX RATES, WHEREAS THE UNDEREMPLOYED ARE INDIFFERENT TO THEM. rilOH TAX 
RATES, MOREOVER, DISCOURAGE THE FORMATION OF NEW ENTERPRISES AND ACCELERATE THE 
AGING OF EXISTING ONES. THERE ARE STILL OTHER ADVERSE EFFECTS: HibH TAXES RETARD 
CONSTRUCTION OF BOTH PREMIUM AND WORKER HOUSISG, WHICH'" IN TURN DISCOURAGES THE 
KINDS OF PEOPLE WHO LIVE IN THESE KINDS OF HOUSING FROM MOVING INTO THE CITY OR 
REMAINING THERE.- 



Document . Abstract . 

• Total Words 1699 Total Words 244 

Data Elements ; 139 Data Elements 24 

* I Content .1727 

j Length .1436 

; DC 1.202 

* • f 

Figure 4.25 Computer -produced abstr<ict of "A Computer Version of How a 
City Works" 
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nDiriM<: nF HYPClOERMIC HEOICATIOM. iNORHAN HOWAHO-JONES.SCItNTIFIC AMERICAN 
V^l.^T iTlol MANUARY 197l).» TH6 AVAILABILITY OF COMPOUNDS THAT-MERE HIGHLY 
IcMl" IN HiSutI WAS I STIMULUS TO THE SEARCH FOR NEW METHOOS OF 

i5"lX!sTlRING T^t„.°?ATER ANOTHbR ARROW POISON. CURARE HAS TO BECOME A VALUABLE 
N mJoERN ANisTHESIOLOGY.) THF NOTION OF ADMINISTERING ™«0^^H Thk 

ciin rAINEn GROUND SLOWLY. IN 1S59, AhTER HOOD'S METHOD HAD BECOME KNOHM, tHE 
FRE^H ^HYSICuS^ JoUIS JULis ecHlER PUBLISHED AN ACCOUNT OF HIS OWN EXPERIENCE 
C?T^^ HYPOOERM C MEDICATION. Ht HAD USED A SYR'INGE THAT DIFFERED .FROM THE ONt 
M^LOYEr BY wSd. AnS HE CABLED IT THE -PRAVAZ APPARATUS.- "*Y*^ "f "^ISl^AJt 
^VDC ne ^YBIurF TC INJECT CHEHICAL COAGULANTS INTO THE BLOOD V=SSELS OF ANIMALS 
ILr NeSer FOR HYPMERMIC MEDICATION EITHER IN HIS PATIENTS OR IN ANIMALS. 
^^lER^S USE OF PRAVZZ- NAME LEO T^ THE BELIEF THAT IT HAS HE WHO HAD INVENTED 
?^fhyIdDERMIC SYRINGE AND INTRODUCED HYPODERMIC MEDICATION INTO PRACTICE. IN 
S^hIbRITIW PHYSICIAN J. M. CROMBIE. GIVING AS HIS JUSTIFICATION -THE 



5;,LK ^THREADS SLoGlY OrIwN THrSuEh tSu PERF^RAn^NS IH THE SKIN BY MEANS OF A 
fm^AHlF NEEDLE THE NEW ALKALOIDS HAD BEEN DISCOVERED? IN THE CASE. OF MEDICINES 

™us iLS"?o fSr gIneJal effects, he called the method 'HYPODERMIC- to 

m^TlNGUlSH IT FROM THE ENDERMIC. AND FROM THE LOCAL INJECTION OF HOOD.... THE 
SipmERM rolJFERrFRDM THE -METHOD OF WOOD.' THE LATTER PLAN HAS FDR ITS OBJECT 
™f LnC^L JrEA^MENT OF A LOCAL y»!=FECTION.- UNTIL THEN HOOD HAD NOT CONTESTED 
.^NTpk"^ ClIiM l?f T^is STATEMEN GOADED HIM INTO A REPLY THAT HAS TO LEAD TO 
rTcRIMoSloTs* ASo'uroiGN^EtED PU.' EXCHANGE OF^CORRESPDNDENCE IT TOOK TH^ 
LONi FDR THE COMPLICATIONS ATTEk: NO HYPDOEIMIC MEDICATION TO BE HIDE 
ACKNOHLEDGEOiBY THE MEDICAL PROFESSION.-^ 



1 

Document Abstract 

- Total ^Words 4297 Total Words 264 , . 

. Data Element's 349 i>ata Elements 19 

Content .0544 
Length .0614 

DC 0.886 ' 

> 

Figure 4.26 Computer-produced abstract of "The Origins of Hyp^odermTc 
Medication" 
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8.3. Summary of Results of the Experimental Application of the 
Evaluation Criterion 

a For each of the fourteen sample abstracts content, length, and 
data coefficient are tabulated in Table 4.3. The Content ranges from a 
low of .016 to a high of ,237 vich an average of ,129. The Length 
ranges from a low of .011 to a high of .205 with an average of .123. 
The values of the data coefficient range fror* 0.757 to 1.500 with an 
average of 1.063. For the six docisnents that had fewer than 2000 total 
words, examples number 2, 3, 5, 6, 10 and 13, the average length of the 
abstract was .143 of the original and the .average data coefficient was 
1.071. For the four documents that had between 2000 and 2500 total words, 
examples number 4, 8, 11, and 12, the average length was. 114 of the 
original and the average data coefficient was 1.045. For the four 
documents with more than 2500 words, examples number 1, 7, 9, and 14, 
the average length was .103 of the original and the average data' 
coefficient was' 1.006. These data show that the abstracting system 
produces relatively shorter al)Stracts for longer documents. 'They also 
show that the abstracts of longer documents are of poorer quality, 
although they are still above the minimum level of acceptability. 

The average data coefficient for the sample abstracts produced by 
the abstracting system was 1.063. This indicates that the system pro- 
duces abstracts which are acceptable, but not outstanding, the / 
/ • 

abstracts had low data coefficients because taey x^ere toe lohg 
for the quantity of data retained. It would be possible to improve 
these abstracts by incorporating a procedure to edit the output from 




t 
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Table 4.3 Summary of results of the examinat/.on of fourteen computer 
produced abstracts 



Examp le 
Number 



Content 



Leneth 



Data 
Coefficient 



1 
2 
3 
4 
5 
6 
7 
8 

10 
11 
12 
13 
14 



.0813 
.1897 
.0938 
.1019 
.2374 
.1402 
.2121 
.1876 
.0646 
.0857 
.1892 
.0160 
.1727 
.0544 



.0981 
.2048 
.0861 
.0907 
.1638 
.1472 
.1867 
. 1750, 
.0638 
.1133 
.1773 
.0107 
.1436 
.0614 



.829 

.926 
1.088 
1.124 
1.450 

.953 
1.137 

.958 
1.014 

.757 
1.064 
1.500 
1.202 

.886 



Average 



.1290 



.1231 



1.063 



\ 
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the abstracting system. The iniprovement should be based on a reduction 
in length without a corresponding reduction in content. The following 
chapter provides some techniques for the improvement of abstracts based 
on this conclusion. 



I. 
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CHAPTER V* IMPROVEM^T OF THE ABSTRACTING SYSTEM 



1 . Improvement of the Quality of Abstracts through Sentence 

. Modif ication■ ^ , , 

1.1. Rationale for Sentence Modification 

Most existing automatic abstracting systems only select- sentences 

from the original document to form an extract* ^But sentences which 

are well-suited to the document in their original context, ^may not be 

I 

well-suited for inclusion in the .abstract because bf the new context i.n 

which they a?e ^found-r ^-requently-, the set of selected sentences provides 

only a set of, disjoint .ideas in unrelated sentences. Wyllys considers 

this p. "'perty of extracts to be a major disadyantage. 

f . • / ^ ^ ' 

.The most: serious disadvantage " of current computer produced 

- abstracts is that they consist of individual sentences of 
the original, text, extracted according to one or more 
criteria. Not only do the extraction criteria require 
further researcR, but the resulting , set of individual 
sentences presents^ problems of dis jpintness , incomplete-, 
ness, redundancy, /and the like* The ultimate goal of 
research in automatic abstracting is to enable a computer , 
.program to "read"* a document- and "write" an abstract of 

it in conventional prose -style, but the path to this goal 

is^full of unconquered obstacles. (1) 

Conventional pros^ style dictates that appropriate connnectives and 

transitions between ideas be provided so that the reader is not -required 



The^work reported in SectidrTl'of this chapter was carried out with 
C. E. Young. A paper^ entitled "Improvement of Aur >matic 'Abstifacts- 
by/the Use of Structural Analysis," neports the results -of this 
resea'rch and has bj^en submitted for publication in the Journal of 
the American Society for Information S.cie'nce . I thank Miss*^ Young 
for permitting me to. include this mfd:erial in my dissertation. 

171 . 
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to guess at what these should be. An abstract may be difficult to 

understand if the ideas it expresses are not presented and connected in 

*a logical way. As Needleman puts it, unity of a document requires-^, 

primarily, a judicious assembling or blending of ideas and details; and 

coherence is achieved when these details or •ideas are so arranged and 

so worded that there is a clear, continuous, logical progression of 

* / 

thcjugjit from sentence to sentence (2). ' ' 

In order to provide a logical progression of thought within an 
'abstract, sentences should be .related to the. preceding and following 
sentences, 'Or a clear transition of thought should be^provided. The 
relationship /between sentences may be^ expressed by the repetition of , 
keywords and by the use of connectives to bridge the 'g^ps^ ip thought 
between rentences.. ^JJot every statement^ demands a transitional link, 
^but a statement may often need to^ be connected in some^^way with those 
^in juxtaposition with it. This is -especially so when» sentences appear 
to be "loosely connected', yet have real sequences of ideas that should 
be brought into clear relation by^the^use of stylistically strategic 
transitional devices. . " 

The primary goal in the development of ADAM was to construct an 
algorithm which would delete all sentences from the original .document 
that were not worthy of inclusion Jn the abstract, while retaining | 
those sentences that' did not satisfy -the c^^letion^criteria. In. ADM, 
each sentence ^of^ the original document is first evaluateJ'lndividually . 
' If .the sentence is cons leered, :wor thy of inclusion in tlie abstract, but 
' requires an anCecedoat, the three sentences preceding the ^selected ^ 
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sentence are examined to determine whether they should also be included 
in the abstract. The requirement of an antecedent is established if 
the selected sentence contains certain function words or phrases, such 
-as "for that reason", "hence", "in that case", "their", "these", "thej'",- 
"this" and "those". Such a criterion provides a good method of finding 
sentences which should be included in an abs^tiract because of tfieir 
relationship to oth^r sentences selected for inclusion through 
application of the principal selection criteria. There are, however, 
occasions when this technique is net effective. Those include c^ses 
in which the related sentences are separated by more than three sentences, 
or -In which intefsentence relationships are indicated by devices other 
than fhe function words or phraso. . normally employed* Theise occasions 
occur frequently enough to warrant the development of .other methods for 
the detection of intersentence references* Three alternative situations 
may arise in^attempts to handle intersentence references*^ If a- - . 
sentence is judged abstract worthy and it contains an indication- of* 
intersentence reference, then * • ' ' 

a. the related sentences must be found ijj^he ^document and 
» included in the abstract; or 
b* if the related sentences cannot be found, the selected . 
' — — —sentence must be "rewritten to make it^ stand alone; or 
c. if neither of these cases applies, delete the selected 

sentence. ' ^ . " 

The problems associated with the location of intersentence -referents 
and with the* rewriting of sentences of the' original document,- and . 
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possible approaches to their solut|ion, are considered in the next 
section* i 



1.2. The Structur^al-Analytic Approach to Sentence Modification 

In this research procedures have been developed for the- modification 
of sentences initially selected by ADAM (all processes tp be described 
are performed on the abstract,^ not the original document). These 
procedures include ^ , ' ^ 

a* a method of identification of words and phrases of 
potential importance to the abstract; " * - . 

b. a meansiof retention or deletion of certain clauses or 
sentences; 

c. a method for the' revision of and, in some instances, the 

^ » "J 

^ cteation of sentences* 

■ 1 ■ . 

'rhese procedures have been developed using thfe notions of structural - 



analysis (3) and employing a number of program^s that facilitate this 
analysis ♦ 



r 

- The , Structural approach t^b linguisizic analysis|employed in this 
resear.ch is" based jupon-^the notions of Fri'is (3), who criticized the 
traditiqnal approach to grammatical analysis for ifcs lack.of consistency 

' \ - / • • - f 

of definition of grammatical classes. To Fries, gr^ammatical ^classes 
/ should- be Defined on the basis of usage, rather tha^i on the basis of 



Mmeaning". Thu$., four main dlasses of words, were defined based upon 
^^^^^li^^oiitions the words could occupy in one or more pimple frame 

:ntences. ?or instancel, the sentence . /I 



sent 



,17S 

The concert^ was good. ' ' 
*.erves as a frame such that any word that can replace 'concert' in the 

fifame is a member of Class 1 , while any word that can replace 'was' is 

> 

a member of Class 2 . Similarly, words that can replace 'good' in the 
sentence frame are members of glass 3 . Class 4 words are those that ^ 
can replace 'there' iii the sentence frame ^ 

The team went**^ther6. 
■ • 

These four classes o£ words correspond roughly to the traditional 

s 

classes' 'noun', 'verb', 'adjective' and 'adverb',, but t;he classes are 
defined strictly on the basis o§ structur^. 

In addition to the four classes mentioned above,. Fries defined a 
fifth class of words which is composed of fifteen groups of specific 
elements. Some o£ these groups- include preposition^., conjunctions, > 
proYiouns, auxiliary verbs and (determiners (the articles, among others) 
This fifth class, called, functfion words ,- differs from the other four 
in that its mejpbers 

a. serve^as relators between groups of Words in/ the .other 
four classes, as- well as structural signals jwithin the 
sentence; 

b. have no meaning ascribed to themselves, but must be known 

as specific items; • \ 

' ! ' ^ 

•^^ c. constj^tute a small set of elements^ (Fries identifies 154) 
which generalJL-y account for betweeti ^45% and 50% of all 
Word ^occurrences in a given .body .^of text (4). ! 
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Using the ideas' of Fries as a basis, a program has been devised that 
identifies the class to which each word in an English sentence 'belongs, 
rogram, called MYRA, has been described -elsewhere (5), It served 
one of the principal tools* of the-" present wofle, and its use will 
become obvious shortly. - " , - 

Twro additional language analysis programs which- have been used 

in this study include a phrase ^analysis program and a clause analysis 

y w ■ 

program. / The phrase program: ^- , 

1. identifies and isolates noun, verb, preposi-tiori- and- 

, - ^ 

adjective phrases; * - ^ ^ : 

; • ■ , 1 * 

2. Identifies the head word of each phrase; 

•. . . - ' 3 

3. identifies the subject and object of each sentence. 

The clause program identifies and isolates each clause wi^thin a sentence. 
The operation of the three programs mentioned above can be illustrated 
by means of a simple example. The .sentence ' / ^ 

\ ^ • 

" « / The thief ran from the pollen. 

contains the three function words 'the', ffrom' and 'the'. The first 
'the' signals the fact that, either an adjective or a noun must follow. 
The preposition 'from' introduces^ a ^oun phrase, and the second 'the' 



The head word of a phrase is that: word which all other wor^ls in the 



phrase modify. 



V ■ .1 



^ the program referred to contains an implementation,^ of case grammar 
(5). The major case ass-' gnments are comparable to the tradit.onal 
notions of Subject and object. 
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causes 'police' to be identified as a noun, Assiiming that 'thief .were 
identified as an adjectlve^inin:ially, and that 'ran' were identified as 
a noun, MYRA. would note that the string of words contained no verb, so 
that an attempt to reassign some of the words would be made. In this 
example, the only logical place for a verb to occur is before the 
preposition. Hence, 'ran' would be reassigned as a verb and 'thief * 
would be classed as a noun. The results of this' analysis form the 
input to the. phrase identification program. 

The phrase identification program interprets the determiner 'the* 
as initiating a^noun phrase. The program continues- to scan to the 
right in the sentence until a noun is located. The phrase is thus 
delisiited, and the head word (the noun) is also identified. Other types 
of phrases are similarly identified and delimited. ' 

The clause identification program depends,, for its proper 
functioning, upon the grouping of the phrases that follow the verb 
phrase(s), and upon the identification .of coordinate con juncti'ons, 
subordinate -conjunctions, relative pronouns and words such as 'that' 
and 'there.'. This- program is based upon the definition of a clause as 
a sequence .of words that contains one and only predicate. This 
definition isT due to Cook (6). Consider, for example,'- the sentence 

The Company admitted that they were wrong. 

Two verb phrases would be recognized by the phrase^^ identification . 
program, namely, 'admitted' and 'were'. The structural signal that 

} ■ 

differentiates the two clauses in this sentence is the word 'that'. 
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Therefore, the two clauses are 

The company admitted 
(that) they were wrong. 

The three programs described briefly above are the basic analytical 
tools used in this research. A complete description of the programs 
is given in (5, 7). Their application will be made evident in the next 
section. 

1.3.^ Rules for Sentence Modi fication 
• ' - — — ^— ■ 

Five specific rules for etfecting modifications on sentences c Z 
an abstract have been develop(fd, based upon data provided by the 
structural-analytiV pwgrams mentioned in the preceding section. Tlicse 
rules are 

1. Combination of sentences by means of a coordinate conjuncllon. 
?. Combination of sentences by means of a subordinate conjunction. 

3. Modification of-^ sentences by means of a graphical reference 
transformation. 

4. Creation of sentences by means bf a reference tabula^rion. 

5. Revision or deletion of sentences for context modification. 

These rules x^ill be described in detail in the "^following" sections , The 

symbolc-^y emptoyed in these descriptions is defined in Figure 5.-1* 

" ■ ' • ' * 

1.3.1. Combination oi. Sentences / " , , . 

The primary criterion used to determine whether sentences might 'i 

be cuitably combined is that ofi parallelism . For two .sentences to 
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NONP 
VRBP 

• • • 

...p* 



Noun phrase 
Verb phrase 
Prepositional phrase 

% 

Pronoun phrase 
Continuation^ 
Modified phrase' 

Phrase required for the application 
of a rule 

Phrase which is not necessary for 
specific consideration in the 
application of a rule 



. Figure 5.1 Symbology employed in the description of rules for sentence 
modification. 
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read smoothly in combination, there ^must be. parallelism of structure 
and continuity of thought. Two sentences are said to have parallel 
structures if they have the same ordering 'of dependent and independent 
clauses and a similar ordering of phrases (by type). The strict 
parallelism in the ordering of sientence elements is relaxed in the case 
of prepositional phrases, wherein both the , number and order of .the 
phrases may differ among the sentences combined. 

For a determination of continuity of thought, it has been found 
adequate to test for identical main verbs or for identical subjects 
in che two sentences under consideration. Such a test does not, of 
course, in the case of non^identical construction, mean that continuity 
of thought does not exist, but such identity insures that the sentences 
will be apprbpriately combined. 

A second criterion that can be used to determine whether two 
sentences should be combined is that of sentence complexity . Sentences 
that ar^ quite long and of complex construction (such as the. last 
sentence of the previous paragraph) are more difficult to read than 
shorter, simpler sentences and they are generally undesirable in an 

A 

abstract. Thus two sentences would not be combined if one or both 
contain too many clauses. I ijave stipulated that sentences may not be 
combined if they have more than one independent and more than one 
dependent clause. 

1.3.1.1. Combination of Sentences by Means of a Coordinate .Conjunction 
rhree alternative rules have been developed for combining sentences 



by means of a coordinate conjunccion. These rules can be applied to 
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any pair of sentences which pass tefits for similarity of structure and 
for similarity in their subject and verb phrases. Every pair of 
sentences must satisfy the following structural rules: 

1. Determine tha6 each sentence under consideration has only 
one independent clause and at most one dependent clause. 

2. Determine thatJ? the sentences are of parallel construction. 
The order and type of clause'^must be the same in both 
sentences and there must be similarity of phrase structure. 

Each rule has a set of associated similarity tests. For each rule, the 
following -tests must be made: 

Rule l.a - identical main verb phrases 

Rule l.u - identical subject noun phrases 

Rule l.c - identical subject noun phrases and identical main 
verb phrases 

Rule l.d - identical head word of the subject noun phrases 
and identical main verb phrases. 

The most general of the rules is that which requires identity- of 
both main verbs. Auxiliaries and adverbs contained in the verb phrase- 
may vary from sentence to sentence; only the main verbs must be the 
same. For instance, the verb phrases 'can provide' and 'cannot generally 
provide' satisfy the criterion of identical main verb and so would 
p^jmit the combining of the sentences. To clarify the issues so. far 
discussed, consider the following sentences. 
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•Thus, the profes,?ion of medicine provides services 
which utilise its knowledge of anatomy.^ physiology, 
neurology, pathology, and such areas. 

'The legal profession provides services which utilize,/ 
its St;ecial knowledge of jurisprudence.' ' 

Once. the classes to which the words in -the sentences belong have been 

determined, the clause program would identify two clauses in- each 

sentence. For the first sentence^ these are 

•Thus, the profession of metdicine provides services' 

and 

••which utilize its knowledge of anatomy, physiology, 
neurology, pathology,' and such areas.* 

The second clause is identified as a dependent clause since it begins 

with the relative pronoun 'which* . An identical clause construction 

is found in the second sentence. Application of the phrase analysis 

program to these sentences yields a determination that the phrase 

structure of V\e sentences Is similar. The complete analysis of thes< 

two sentences is shown in Figure 5.2. Since the main verb in the twd 

sentences is the same, the sentences meet all the criteria for 

combination. Using the conjunction 'and* for this purpose, we ohtaiiu 

the new sentence * 

\ 

'Thus, the profession of medicinti provides ser\>ic'e's which 
utilize its knowledge- of anatomy, physiology, -neurology, 
pathology, and such areas and tha- legal profession provides 
services which utilize its sepcial knowledge of jurisprudence* ' 



All sample sentences^ are taken frcm abstracts produced by ADAM. 
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The rule tor combining sentences that have identical main verbs may be 
more generally expressed as follows. 



S = NONP VRBP, NONP, PRN VF.BP , NONP 

1 a 1 b c d e 

+^ I ' . ^ 

S = NONP VRBP' NONP ipRN^ VRBP . NONP, 

2 f 2 g|'h L j 

S = NONP mBV NONP PRN VRBP, NONP 'AND*' 

3 a 1 be d e . 

. ' NONF VRBP NONP PRN^ VRBP . NONP . ' 

f 2 g h 1 J 

A second rule for combining sentences, one slightly less general than 
the above rule, requires' the presence of identical; subjects in the 
sentences' under consideration. If the noun phrases that precede the 
main verb of a pair of sentence.^ match word for word, then the ;'Ioun 
phrase of the second sentence is deleted and the remaining, jejitehce 
fragment is combined with the first sentence by means of a coordinate 
conjunction. -"In the tv.o sentences 



'The system exceeded the capacity of its 
present auxiliary equij^jnca'c. ' 



and 



'Th^i svi^^tem was modified for further te.oting.' 



Xhe subject noun phrases are fpund to be identical, so the subject 
of the second sentence is deleted and the remainder combined with 
the firot to yield the sentence, ^ 
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•'The system exceeded the capacity of its present 
auxiliary equipment and was modified, for further 
testing.' 

Expressed symbolically, this process is ' , 
= NONPj^ VRBP^ NONP^ 

+ 

$2 = NONP^ VRBP^ NONP^ 
\ 

= KONP, VRBP NONP^ 'AND' VRBP NONP, 

3 1 a b . ' c ' d ^ 

The third rule for combining sentences is the most specific of 
the three because it requires that two sentences have identical subject 
noun'-phrases and identical verb phrases if they are to be combined, 
uie following sentences satisfy this criterion. 

'The experiment resulted in a modification of the 
original hypothesis." 

'The experiment resulted in a change of our basic 
approacl),. ' * v 

Both of these sentences pass the tests far parallelism in clause and 
phrase structur,e, and both contain subject and verb phirases that match 
word for word. Thus, both the subject and verb phrases are deleted 
from the second sentence and the remaining fragment -is combined with 
the first sentence by use of a coordinating conjunction, yielding the 
sentence 

'The experiment* resulted in modification of the ori,.ginal 
hypothesis and in a change of our basic approach.' 
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This seute^ combination process can be expressed symbolically as , ^ 
follows. - ' ' . 

= VOnVyl VRBPj^ PRPPg 

+ 

S = NOSP, VRBP, PRPP. 

2 ■ 1 J- " 

S = NONP VRBP, . . . 'Al^D' . . . 

3 1 1 

A variation of the third rule is >.e fourth rule which can be " 
employed when a single noun phrase os subject is present in both 
sentences un^er consideration, Thie- rule requires only that the head , . 
word of each subject noun phrase be identical, in addition to identity . 

* 

of verb phrases, in order for the sentences to be combined. In this 
case," however, both the subject modifiers (except determiners) and the 
sentence predicates must be conjoined by a conjunction. The vord 
..respectively" is added at the end of the sentence to maintain the 
proper logical' ordering . The following sentence., serve, for illustratioh. 
Application of the- .rule just described to the sentences 

•individual manufacturers offer ALGOL, BASIC, and 
FOCAL comp i ler s . ' 

.Cost manufacturers offer programming support on an 
individually negotiated contract basis. 

yields the sentence 



Individual and cost manufacturers offer ALGOL, BASIC, 
and FOCAL compilers, and programming support on an ^ 
individually negotiated contract basis, respectively. 
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In applying this rule, the verb must b<s converted to plural form if 
it is not already plural. If the verb ph:/ase is initiated by an 
auxiliary verb (a form of the verb 'to be^ or 'to have') the pluraliz- 
ation process is straightforward. The number the varb is detersriined 

comoarins: ^hp ..iL^^^^^jo^ containing th6 inflected 

forms of these verbs. If the auxiliary In the sentence matches one 
in the dictionary that is of aingv.ldi' form, the verb phrase of .the 
sentence is appropriately pLuralised. 

If an auxiliary verb does not initiate the verK*^urase, and 
provided the subject of the sentence is in-tiie third person, the 
.pluralization process is based upon the following observations rather 
than on dictionary look-up." If a verb ends in 'ies' and contains 
fewer than five characters, it is. pluralized by dropping the 's'* 
Thus verbs like *iies'.a. 'dies' V7ould be appropriately transformed, 
into their jUural forms- If a verb, ending in '.ies', is five or more 
characters in length, then the 'ies** is changed to 'y* to yield the 
plural. 'Tries' and ^flies* would thus bcf satisfactorily plur;rilized. 

If the verb ends in 'oes' or 'ses', as in 'goes* and 'atresses', 
then the 'es' is dropped 'to form the plural. Finally, if none of 
these rules appli^es and the verb ends in 's', then the letter .'s ' ^is ' 
deleted to form t|he plural. 

If the verb is singular, then the head word of the subject noun 
phrase must be pluralized as well as the verb phrase. The 
plurali^:atioa of nouns is. not so easy a task as i^ the plnralizstion 
of verbs. Nevertheless, programs for the purpose have been developed 
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which v^irodnoe satisfactory re'sults; , I have chosen .to use the procedur-es 
developed by ?etrarca and Lay (8). 

After the nouns and. verbs have been pluralized-, a final check Cor 
the indefinite arniclo.s 'a' and 'an' is m^-^^^^ If these articles are 
found then they are de,v?ted. The sentences are thus made ready for 
combi '^atiui^; a process depicted formally as follows (where asterisks 
denote sentence elements potentially modified by the pluralization 
process or by deletion). " " 

. S ^ ADJP. «ONI?, VrBP'' NOWr 

1 ilia, 

S = AD.TP NO^r VRBP NONP 

2 2 i 1 b ^ 

S ~ ADJP * *AND' ADJP * NOKP * TOB?,* NONP *AND' NONP^ 

3 1 2 11a b 

* , respectively, ' 

1*3.1.2. Comb i nation oi: Sentences by Means ^of a Subordinate Conjunction 

A second general way^of combining sentences is to find two whose 
structure is such that one. sentence can be made a suboijdinate clause 
of the other. Piijcedures have beer* developed which search thci abstract 
for sentences that, can be thus combined. As in 'the case of sfintence 
combinations effected through use of a coordinate conjunction, the 
application of this procedure requires that' the candidate sentences 
contain only one independent and one dependent clause* Furthermore, 
the dependent clause must not begin with 'who' or 'which'. 

. This rule combines twp sentences into a sentence J^ith an 
independent clause and a oubordinate clause. In order to determine 
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a a sentence can be made a s'ibcftdlnate clause, the subject of the 
ceafcenee is compared vlch evexy othe^ noun phrase in the abs tracts 
(intlxiCioQ noUij pjirases y-mc are che objecfof: a preposition.)' If a 
w^tch IS found, ti{en tHe iseacence in which che noun phrase occurs 
becomer* the IhdSpeadent clause* * As an* illustration of this rule. 



*A set of coft^cutlv^ storage locarcions is called a 

*A^Ji:G2£ory bl^dk is labelled V single 'vord called 
a codewor.d.f ✓ ^ * 




The sciiiy&4::^es fire fortaally ^*epj:esenced as: ' . 

. ■ • ■ . ■ , ■• . ■ 

S»imf VRBP, NONP . i 

t ^ a h X . ; ^ 

* ?»K?, VRB? PRPP ViSP %»* -NONP 
2 .l e d e ,f 

Tlie identical phrase la che sentences is-TiONP. 'a memory block') . 

Sfttktence 2 vlll become a iubordinace iilaose of sentence 1 because , 
?<WPl is the ^ubjecc of chis se^ncerxe* NONP -is deleted from sentence 
2, and is replaced either by 'i^Ho* or by 'which". The relative 'pronoun 
^^lO^ t.s associated only wtch nouns chet have human attributes. Thus 
a pefsontf Icaelon tost is made. This is done by. checking the inflect 
iofial endl^ig of the nouo* If the word ends in 'ist(s)' or 'ian(s)'-, 
axid it the %?ord has more chan 5 letters, as;in scientist^ physicians 
and librarian^ chea che n^un is flagged with the human attribute, and 
ti^e' ri^Ucive pronoun *who* is added. If che noun is not flaggea, the 



relative pronoun 'which* ds used. - This rule is not adequate-'to detect 
all nouns that have human attributes, for example , .names, of individual 
but it provides an adequate means of deciding between '.which' ahci 'who 
in most cases . ^, - 

The new sentence has the form:* 

= NONP VRBP^ NONP '' , which' VRPB PRPP, VRBP NONP 
3.a b c d e^f.' 

-Eor the sample sentences given above^ the new sentence is: 

« * . ' f • ^ " ' 

'A set pf consecutive storage locations is called a. memory 
' block, which is labelled by a single word called a codeword.*? 

' ' ' • ■ . • 

The rules for combination of sentences by means of a coordinate or 
subordinate-. conjunction are summarized in Figure 5.3J 

1.3-.2. The Graphical Reference Rule^ . ' ^ 

It is* not desirable in an abstract^ to refer to a specific graph, 
figure or table contained in the document. If, however, such 
references do find their way into the abstract, they can be readily 
identified and replaced by a general descriptive reference.* Specific 

references, may be, identified through the use of a small dictionary of 

-/ *i ^ 
words signalling- such re^ferences and by applying certain contextual 

tests .. The dictionary .includes such words as table, ^tab,, figure, fig 

gr^aph; etc. And the .contextual tests include the identification, 

following one of these words, of a) a string of digits, b)^a"<;^ngle 

character, or c) a Roman numeral. * If a contextual test is satisfied, 

the number or character following th^ dictionary word is deleted and 
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1. Combination of sentences by means of a coorQihate conjunction 



Identical main verb phrases 








« 
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S, NONP VRBR NONP^ 
1 alb 

+ 


PilN VRBP, 
c d 


NONP 
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S = NONP VRBP NONP 
^2 f 2 g 


PRN, VRBP 
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NONP 

j 
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S = NONP VRBP NONP 
3 alb 


PRN VRBP 
c d 


NONP 'AND* 
e 
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NONP^ VRBP NONP 
f 2 g 


PRN VRBP 
K i 


NONP 
j 






i 
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Identical subject noun 


phrases 
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S. = NONP VRBP NONP 
^1 i a b 


i 








1 

! 



S = NONP VRBP NONP 

4.2 2 • c d — ^ 

S = NONP VRBP\ NONP 'AND' VRBP NONP 
3 1 . a . b c d 

Identical subject noun phrases and identical main vdrb phrases 

S •= NONP VRBP PRPP 

^1 1 a . 

S = NONP VRBP FRPP 
..211b 



S -= NONP VRBP 
3% 1 1 



'AND' 



.d. Identi-cal * head word of the subject noun phrases ^and identical 
main verb phrases * , 

Si = ADJP- NONP- VRBP 'NONP 
1111a 
+ - . . 

= ADJP^ NONP^ VRBP^ NONP^^ ^ ' / 

4- ' 

So = ADJP,'- 'AND' ADJPo- NONP-^' NONP * NONP 'AND' NONP^ '", 
J • J. Z i i a , D 

respectively' 

Combination of Sentences by. Means of a Subordinate' Conjunction 



Sj^ = NONP^ VRB^ m^?^ 

S^ = NONP, VRBP PRPP , VR6P 
,2.1 c d e 



. NONP, 



S^ = NONP VRBP^ NONP, ', which' VRBP- PRPP^ VRBP NONP. 
3 abl 'cdef 

Figure 5.3 Summary of r,ules for combination of sentences by means 
^ * of a coordinate or subordinate con'juhction * 
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the 'initi^al .word of the. phrase is identified. "* If .the initial word is 

•the*, 'this word- is deleted.' Finally, the article 'a* or"*an* is^ * ; 

inserted (if 'not .'already present) at the beginning of the phrase. 

Sentences such asj^ ' - ' 

S ''Table 2 presents nine areas of endeavor and their- 
^ . associated disciplines.. * - . , 

Figure Z presents graphically the general model -of 
" 2 , information transfer.* . " * 

-are- modified to a generaF reference:. 

' . S *: 'A table .presents nine areas of endeavor -and their 
^ associated disciplines.* . 

t figure presents graphically the general model; of 
2 information trans/er.* . 

by application of this rule. 

1.3.3. Reference Tabulation 

Literature references contained in a document are not included in 
an abstract:, yet the number and kind of reference may indicate some- 
thing of the str,ength of the paper. A means has therefore been 

devised of- tabulating the references in a paper, based initially upon 

< ^ \ 

the format used by the Journal of the American Society for Information 
Science , and of presenting this data in a sentence of the abstratctl 
The procedure developed can be described as follows. 

The heading designated by 'References* is identified in the. text. 
Next set N = 1, and searcTi for the character string *bN«b', where b 
indicates a blank. A variable REFERENCE_FLAG is set to zero. After 
•bl.b^is located, REFERENCE_FLAG is set to 1^ N is incremented by 1, 
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and the next consecutive number in -the string 'bZ.b* is sought. 
This process is continued until the text is exhausted or until another 
heading 'Appendix*, is encountered. At this time, N is decremented by 
one. The REFERENCE_FLAG is checked and if it is equal to zero, the ^ 
sentence *Nb references are given.* is appended to the abstract. If 
"REFERENCE_FLAG. is non-zero, the sentence *N xefuxences are given.' is 
appended to the abstract. This procedure is specific to journals that 
,havfe this format for references, but modifications of this basic, 
technique could be added to the system to allow for varying formats. 

■ ■ / • 

1.3.4. Context Modification 

Sentences sometimes appear in an abstract v;hich' refer to sentences 

that appear in the original document but not in the abstract. Such 

sentences have been found generally to contain phrases of ordinal 

designation such as *the second . . ' or 'the first ..." before the 

ft 

main verb of ^ the sentence. This is the case, for example, in the 
following sentences. 

'The second mechanism is structural change: ..." « , 

r 

'The second is that reactions of oxygen atoms in the low 
temperature region tend to be more stereoscopic with 
'trans- than- with cis-olef ins . ' 

'The first is the H12 developed by Honeywell."*' 

A sentence that contains an ordinal number, n, in the first phrase 
requires at least n - 1 antecedent sentences, the first of which 
indicates the points enumerated. If a sentence exhibits the fault 



of referring to a Sentence not in the abstract, the required antecedents 

could be searched for in the original document and, when found, added 

to the abstract. The tfime and effort involved in such a search would 

not, however, generally be acceptable. Hence, the following procedure 

has been' developed for handling sentences of an abstract^ whose leading 

phrase (s) contain words which demand an^ antecedent. ' 

If a sentence of an abstract contains a leading phrase which has 

an ordinal number as a component, the abstract is searched backward 

from this sentence to find sevitences that contain lower ordinal numbers 

or else an appropriate cardinal number. For example, to complete the 

antecedent relationships for either of the first two sample sentences 

i 

aT)ove, sentences must be found earlier in the abstract which contain 

the phrase 'the first and the adjective 'two', respectively. If 

the required aRtlBCWient sentences are not found, then the sente^hce is 

handled in one of thfe following ways. If the ordinal number serves 

# 

as an adjective in the sentence^ then it can be deleted and the 
determiner 'the' is changed to 'a' or 'an', if necessary. The first 
example sentence above would become 

'A* mechanism is structural change: .».' 
t 

If, on the other hand, the ordinal number serves as a noun in 
the sentence. in question, then the sentence is either deleted, as in 
the case of the third example sentence, above, or else, if the*^ 
constructioa of the sentence follows the pattern 



I 
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*The (ordinal number), is that , 

the portion of the sentence up to and including 'that' is deleted and 
the rest of the sentence is allowed to remain. Thus, the second example 
sentence above would become 



'Reactions of oxygen-atoms in the low termperature region 
tend to be more stereoscopic with trans-^than with cis- 
olefins. ' , . 



1.4. Quality Improvements Viewed According, to the Evaluation Criterion 

I have desci;ibed several methods for improving the readability of 
abstracts produced, by the abstracting system'. These methods are designed, 
to decrease fhe number of words needed to express a concept and to make 
the resulting abstracts more readable and coherent. Tl^ese methods can 
be viewed in light of the evaluation criterion to study their effect on 
the quality of the final product. In this section I v;ill present three 
examples of the r'ssults of the application of the rules for sentence 
modification. ^ 

The first example is the abstract of a chemistry article, "Addition 
of Oxygen Atoms to Olefins at Low Temperatures" which appeared in the 
Journal of Physical Chemistry . The abstract of this article, before 
any modification, appears in Figure 5.4. This abstract contains .158 
of the content of the original in .167 of the length and has a 'data 
coefficient of 0.942. This abstract could be edited to reduce its 
length and improve its data concentration. Application of the sentence^ 
modification rules to this abstract results in the abstract shown in 
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addition of oxygen to olefins at low temperature iv rearrangements. gr. klein 
and m.d. scheerf journal of physical chemistry 7^>(3)t 613-616(1970). ^ a 
consideration of the oxygen atom . addition to cis- ino trans-2-auten£ in the 
temperature region 77 to u3 k led to, the formulation of a ncw transition 
intermediate. in this intermediate* the oxygen atom is represented as bound in 
a lodsef three-membered rlfkj witht and in- the plane of, the olefinic structure 
of the reactant. observations on 2-butenes have been fixtenoed to several more 
straight-chain, internal olefins in the loh-temperaturte region. coilparlson of 
the trans-fepoxlde to ketone rations from the cis- vs the trans-ole fin with 
increasing size of the olefin indicates that these ratios diverge. the second 
is that reaction op oxygen atoms in the loh-temperature region tends to be moke 
steroscopic with trans- than with cis-olefins. carbonyl compounds constitute a 
sizea'ble fraction op the products of the oxygen atom addition to olefins in the 
low-temperature region and, as has been noted, >n intramolecular group migration 
is requires for carbonyl formation. the-^ pr inci pal »carrunyl product in the 
trans-2-butene reaction at 90 k is 2-butanone. the forkation of this ketone 
requires the migration of h. compared to the migration of the methyjl group, 
that of h is slightly favored. cis-2-butene is not useful for the comparison, 
as both - of the hydrogen atoms atttacheo to the o'^efinic carbon p.alr are 
suppressed through interaction with oxygen in the complex. the r'elative 

QUANTITIES OF 2-BUTANONE TO I SOBUXYRALDEHYOE IS TAKFN $% A MEASURE OF ThIi RATIO 
OF MIGRATION OF THE HYDROGEN ATOM TO, THE METHYL GROUP. REACTIONS HERE Ef-FECTEO 
AT 90 X IN THE APPARTATUS ROUTINELY USED FOR THIS fURPOSE. THE OLEFINS HERE 
DILUTED 10 TO I HITH PROPANE. THE EXPOSURE TIMC OF OXYGEN ATOMS HAS 5 MINUTES, 
AND ABOUT IX OF THE OLEFIN WAS REACTLD. THE PRODUCTS HERE DETERMINED AT 135 AND 
A HELIUM FLOH OF 100 CC/MINUFE. THE CIS AND TRANS ISOMtRS OF * 
3,4-EP0XY-3,4-DlMETHYLHEXANE HERE NOT SEPARABLE.- LOfALIZTlON OF THE OXYGEfl ATOM 
IN THE TRANSITION COMPLEX PRECEDING ALKYL GROUP RE >iRR ANGEMENT IS NOT IN /XCORD 
HITH THE EXPERIMENTAL RESULTS. AT 90 K, THE RJlfiO OP ADDITION TO C-2 TS lo 
TIMES THAT TO C-3. FOR« MEP, ADDITION OF THE OXYGEf^ ATOM TO THAT CARBON A1QM< OP 
THE DOUBLE BOND TO WHICH THE THO METHYL GROUPS ARfc ATTACHED HOULO bE EXPECTED 10 
BE FAVORED. 
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Figure 5.5. The only rule that was applicable in this case was the rule j 

for context modification. The sentence, 

'The second is that reaction of oxygen atoms in the low- ^ 
temperature region tends to be more stereoscopic with 
trans- than with cis-olif ins . * 

api^ears in the abstract but the required antecedent sentences are not 

contained in the abstract. The sentence's which preceded this sentence 

did not contain the phrase* * the first* and the adjective *two*. Since 

the ordinal number, in this case, 'second*, appears in the pattern 

*The ^ordinal num'ber) is that 

the portion of the sentence up to and including *that* is deleted and 

the- rest of sentence is allowed to remain^ Thus the sentence becomes 

'Reaction of oxygen atoms in the low temperature region 
tends to be more stereoscopic with trans- than with cis- 
olif inS . ' w ^ , . 

•This modification results in a reduction in length of the abstract 
by the removal of the our words. This reduction causes the improved 
abstract to represent A65 of the length instead of .167. This 
reduction in length changes the value of the data coefficient to 0.953* 
This modification does bring the abstract closer to the minimum level 
of acceptability, but there is still a need for additional improvements 
in this abstract. ' 

The second example is the abstract of the article "Storage 
Organization in Programming Systems" which appeared, in the Communications 
of the Association for Computing Machinery . The original abstract, 
shown in Figure 5.6, has a data coefficient of 1.094. This abstract 
can be further improved by application of the combination of sentences 
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ADDITION OF OXYGEN TO OLEFINS AT LOW TEMPERATU«^E IV REARRANGEMENTS. #R. KLEIN 
AND M.D. SCHEER, JOURNAL Of- PHYSICAL CHEMISTRY 74(3) t 613-610 ( 1970) . ^ A 
CONSIDERATION OF THE- OXYGCN ATOM AODIT/ION TC CIS- ^HO JRANS-2-BU fENG IN THfc 
TEMPERATURE REGION 77 TO 113 K LED \)0 THE FORH'JuATION OF A NEW TRANSITION 
INTERMEOIATEJ in this INTERMEDlATEf THE OXYGEN ATHH IS REPRESENTED AS BOi^ND IN 
A LOOSE,'' THREE-MEMBERtD RING WITHt AND IN THC PLAW6 OFi THE OLfcFINIC STRUCTURE 
OF THE REACTANT. OBSERVATIONS dN'2-bUTENES HAVr- BEEN EXTENDED TO SEVERAL MORE 
STRAlGHT-CHAINf INTERNAL OLEFINS. IN THE I OW-TEMPERATURfc K^'VION. COMPARISON OF 
THE TRANS-EPOXIDE TO KETONE RATIONS FROM THE CIS- VS THE TRANS-OLE FIN WITH 
INCREASING SIZE OF THE'OLEFiN INDICATES THAT THFSE RAtIO:> DIVERGE. REACTION OF 
OXYGEN ATOMS IN THE LOW-TEMPERATUSEE REGION TENDS TO BE MCftE STEROSCOPIC WITH 
TRANS- THAN WITH CIS^LEFI.NS* CARBONYL COMPOUNDS CONSTITUTE ,\ SIZEABLE FRACTION 
OF THE PRODUCTS OF THE OXYGEN ATOM ADDITION TO OLEFINS IN THE LOW-TEMPERATURE 
REGION ANDf AS HAS i^EEN NOTED? AN INTRAMOLECULAR GROUP MIGRATION IS REQUIRES FOR 
CARBONYL FORMATION. THF PRINCIPAL CARBONYL PRODUCT IN THE TRANS-2-BUTENE 
REACTION AT 90 K IS 2-8UTAN0NE. THE FORMATION OF TH:S KETONE REQUIRES THE 
MIGRATION OF H. COMPARED TO THE MIGRATION OF THE METHYL GROUP* THAT OF H IS 
SLIGHTLY FAVORED. CIS-2-BUTENe IS NOT USEFUL FOR THE COMPARISON* AS BOTH OF THE 
HYDROGEN ATOMS ATTTACHED TO THE OLEFINIC CARBON PAIR ARE SUPPIUlSSED THRUUGH 
INTERACTION WIJH OXYGEN IN THE COMPLEX. THE RELATIVE UUANTJTIES OF 2-BUTANONE 
TO ISOBUTYRALDEHYDE IS TAKEN AS A MEASURE OF THE RATIU OF r^IGRATION OF THE 
HYDROGEN ATOM TO THE' METHYL GROUP. REACTIONS WERE EFFECTED AT 90 K IN THE 
APPARTATUS ROUTINELY U^ED fOR THIS PURPOSE^ THE OLEFINS HbKE DILUTED 10* TO 1 
WITH PROPANE. THE EXPOSURE TIME OF OXYGEN 'ATOMS WAS 5 MINUTES, Ai^D Af.OUT U OF 
THE OLEFIN WAS REACTED. THE PRODUCTS WERE DETERMINED AT 135 AND A H5:LIUM FLOW 
OF 100 CC/MINUTE. THE CIS AND TRANS ISOMERS OF 3 ,4-EP0XY-3,4-D IMETHYLHEXANE 
WERE NOT SEPARABLE. LOCALIZTION OF THE OXYGEN ATOM IN THE TRANSITION COMPLEX 
PRECEDING ALKYL GROUP REARRANGEMENT IS NOT IN ACCORD WITH THE EXPERIMENTAL 
RESULTS. AT 90 - K, THE RATIO OF ADDITION TO C-2 IS 16 TIMES THAT TO C-3. FOR 
HEPf ADDITION OF" THE OXYGEN ATOM TO THAT CARBON ATOM OF THE DOUBLE BOND TO WHICH 
THE TWO METHYL GROUPS ARE ATTACHED WOULD BE EXPECTED TO BE FAVORED.' 
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STORAGE ORGANIZATION IN PROGRAMMING SYSTEMS fJANE G. JODEIT: COMMUNICATIONS OF 
THE ACH. INTRODUCTION* IN THIS PAPER A REPRESENTATION OF DATA AND PROGRAMS IN 
STORAGE THAT CONTRIBUTES ORGANIZATIONAL S^VrtPLICITY, CODING CONVENIENCE, AND 
FUNCTIONAL VERSATILITY IN PROGRAMMING SYSTEMS IS DESCIBED. HERE PROGRAMMING 
SYSTEM MEANS THE REALIZATION OF A PROBLEM SOLUTION ON A COMPUTER, ANYTHING FROM 
MATHEMATICAL ANALYSIS TO LANGUAGE TRANSLATION. A PROBLEM..SOLUTIDN 1 S DEFINED BY 
A COLLECTION OF ENTITIES, PROGRAMS AND DATA I TEMS 'SPEC IF ICALLY* THE GENERIC TERM 
FOR SUCH kH ENTITY IS AN ARRAY. EACH ARRAY IS NAMED AND CONTAINS AS ELEMENTS 
DATA OR SUBARRAYS. THE SET OF BRANCHES FROM A SINGLE SOURCE IS REPRESENTED BY A 
BLOCK COMTAIKING CODEWORDS OR DATA AS APPROPRIATE. A CODEWORD WHICH CORRESPONDS 
TO A SIMPLE NAME, AS A ABOVE, bUT NOT IS CALLED A PRIMARY CODEWORD. ALL 
S*<IBARRAYS AND DATA ELEMENTS OF AN ARRAY ARE ADDRESSED ••RELATIVE" TO THE SIMPLE 
NAME. THIS JUST MEANS THAT THE MTH ELEMENT OF THE NTH SU6ARRAY IN THE ARRAY DATA 
IS NAMED bATA<N^M>; IT HAS NO OTHER DESIGNATION. THE SET OF PRIMARY 'CODEWORDS 
THEN COMPLEIELY CATALOGS 'THE ENTITIES OF A PROGRAMMING SYSTEM AND ALL ADDRESSING 
IS DONE THROUGH THEi.5 - CODEWORDS. THE OPERATING SYSTEM PROVIDES DYNAMIC 
ALLOCATION OF BLOCKS AND MAINTENANCE OF CODEWORDS. PRIMARY CODEWORDS NEVER MOVE 
, AND THE ADDRESSING IS INDEPENDENT OF SYSTEM COMPOSITION AND STORAGE 
ALLOCATION. CODEWORDS AS BLOCK LABELS AND THEIR USE IN ADDRESSING. A SET OF 
CONSECUTIVE STORAGE LOCATIONS IS CALLED A MEMORY BLOCK. EVERY SUCH BLOCK IS 
LABELED BY A SINGLE WORD CALLED A CODEWORD. AS REALIZED' ON THE RICE UNIVERSITY 
COMPUTER THE GENERAL CODEWORD FORMAT IS SHOWN IN FIGURE I, WHERb L IS THE LENGVh 
OF THE BLOCK LABELED BY THE CODEWORD C; I IS THE RELATIVE ADDRESS OF THE FIRST 
WORD IN THE BLOCK LABELED BY C; THE PORTION OF A CODEWORD USEOailN INDK'IECT 
ADDRESSING IS DESIGNED TO BE USED WITH THE HARDWARE DEFINITION OF THE RICE 
COMPUTER. IF *I IS ON, RETURN TO STEP FOR CODEWORD Ol^l AT LEVEL IF ♦! IS 

NOT ON, USE CI^l AS FINAL ADDRESS AND DO NOT ITERATE. THE LOCATION OF VT AND THE 
ORDER OF VT ENTRIES IS A FUNCTION OF SYSTEM COMPOSITION. INITIAL LOADING OF 
PROGRAMS AND 'DATA IS JUST A SEQUbNCE OF ACTIVATIONS, AND THE BLOCKS WILL BE 
SEQUENTIALLY LOCATED JN THE STORAGE DOMAIN. AS A RUN PROGRESSES, BLOCKS MAY 66 
INACTIVATED AND NEW ONES ACTIVATED, SO THE GENERAL STATE OF THE STORAGE DOMAIN 
IS A MIXTURE OF ACTIVE AND INACTIVE BLOCKS. EACH ACTIVE BLOCK IN THE STORAGE 
DOMAIN IS LABELED BY A CODEWORD, WHICH MAY ITSELF BE A WORD IN AN ACTIVE BLOCK 
OF CODEWORD. ALREADY THE STORAGE CONTROL OPERATIONS OF BLOCK CREATION TO FOf<M 
ARRAYS AND FREEING OF ARRAYS HAVE BEEN MENTIONED. THE IMPLEMENTATION ON THE RICE 
COMPUTER 'provides A REPRESENTATION IN PRIMARY STORAGE WHiCh IS IMMEDIATELY 
APPLICABLE THROUGH A HIERARCHY OF STORAGE DEVICES. THE DESCRIPTIVE PROPERTIES OF 
CODEWORDS, THE MODULARITY OF ARRAY STORAGE, AND THE PROTECTION POTENTIAL IN THE 
'^SYSTEM ALLOW THE CODEWORD STORAGE ORGANIZATION TO BE APPLIED IN A 
MULTIPROGRAMMING ENVIRONMENT. AN INTERRUBT WOULD ALLOW INTERVENTION FOR 
RETRIEVAL.. STRUCTURED ARRAYS HAVE BEEN DESIGNED FDR SECONDARY STORAGE FILES. 
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by means of a subordinate conjunction rule and the graphical reference 
rule* Application of these rules results in the abstract -shown in 
Figure 5.7. In the original abstract the following two sentences 
appeared. . 

Tne generic term for such an entity is an array.* 

'Eac.h array is named and contains as elements data or subarrays.* 

A 

These two sentence^./'can «be coribined by application of the combination 

of sentences C'y means of a subordinate conjunction rule to generate the 

following sentence. . ^. " 

*The generic term for such an entity is. an array, which is 
named and contains as elements, data or subarrays.' 

A similar application of the rule combines the two sentences 

'A set of consecutive storage locations is called a 
memory block. * ' 

'Every such block is labeled by a single word called 
a codeword. * 

to form the ,foLtowing sentence: 

*A spt of consecutive storage locations is called^ a memory 
block, which is labeled by a single word .called a codeword.* 

The graphical reference rule should be applied to the following 

sentence. 

*As realized on the Rice University computer the general 
codeword format is shown in Figure 1, 

Since the figure is not included in the- abstract the sentence would be 

modified to read 

'As realized on the Rice University computer the general 
codeword format is shown in a figure, ' ' 
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STORAGE ORGANIZATION IN PRUGRAHHING SYST6NS; #JANE G. JGDEIT: COHHUN ICATIONS OF 
THE ACH* IN THIS PAPER A REPRESENTATION OF DATA AND PROGRAMS IN 
STORAGE THAT CONTRIUUTES ORGANIZATIONAL SIKPLiqiTY, CODING CONVENI ENCf: * AND 
FUNCTIONAL VERSATILITY IN PROGRAMHING SYSTEMS IS DESCRIBED. hE'RE PRCGRAHMING 
SYSTEM MEANS THE REALIZATION OF A PROBLEM SOLUTION ON A COMPUTeRt ANYTHING fROt} 
MATHEMATICAL ANALYSIS TO LANGUAGE TRANSLATION* A PROBLEM SOLUTION iS DEFINED BY 
A COLLECTION OF ENTITlESt PROGRAMS AND DATA ITEMS SPECIFICALLY. THE GENERIC 
rCRM FOR SUCH AN ENTITY IS AN ARRAYt WHICH IS NAMED AND CONTAINS AS ELEMENTS 
DATA DR SUBARRAYS. THE SET OF BRANCHES FROM A SINGLE SOURCE IS REPRESENTED BY A 
SLOCK CONTAINING CODEWORDS OR DATA AS APPROPRIATE. A^ CODE WORD WHICH 
CORRESPONDS TO J< SIMPLE NAME, AS A ABOVE, bUT NOT (A,!) IS CALLED A PRIMARY 
CODEWORD. ALL SUSARRAYS AND DATA ELEMENTS OF AN ARRAY ARE ADDRESSED "RELATIVE** 
TO THE SIMPLE NAME. THIS ^JUST MEANS THAT THE MTH ELEMENT OF THE NTH SUSARRAY IN 
THE ARRAY DATA IS NAMED DATACNfH); IT HAS NO OTHER DESIGNATION. THE SET OF 
PRIMARY CODEWORDS THEN COMPLETELY CATALOGS THE ENTITIES Of A PROGRAMMING SYSTEM 
AND ALL ADDRESSING IS DONE THROUGH THESE CODEWORDS. THE OPERATING SYSTEM 
PROVIDES DYNAMIC ALLOCATION OF BLOCKS KHD MAINTENANCE OF CODEWORDS. 

PRIMARY CODEWORDS NEVER MOVE, AND THE ADDRESSING IS INOEPfcNOfcNT OF SYSTEM 
COMPOSITION AND STORAGE ALLOCATION. A SET OF CONSLCUTIVE STORAGE 

LOCATIONS IS CALLED A MEMORY BLOCK, WHICH IS LABELED BY A -SINGLE WORD CALLED A 
CODEWORD. AS REALIZED ON THE RlCE UNI VCHSI TY COMPUTER' THE lyENE^sAL CODEWORD 
FORMAT IS SHOWN IN A FIGURE, WHERE L IS THE LENGTH OF THE bLOCJ^' LABELED BY THE 
CODEWORD C; I IS T.HE RELATIVE ADOatSS* OF THE FIRST WORD IN THE BLOCK LABELED BY 
C. THE PORTION OF A CODEWORD USED IN INDIRECT ADDRESSING IS DESIGNATED JO BE 
USED WITH THE HARDWARE DEFINITION OF THE RICe COMPUTER* IF ♦I IS ON, iRETURN TO 
STEP CI) fOk CODEWORD CI^l AT LEVEL I^l. IF ♦I IS NOT ON, USE CI^l AS FINAL 
ADDRBSS AND DO NOT ITERATE. THE LOCATION OF VT ^ND THE OROEP ;f VT ENTRIES IS A 
FUNCTION OF SYSTEM COMPOSI ST ION. INITIAL LOADING OF PROGR/I^S 1^3 DATA IS JUST A 
SEQUENCE OF ACTl VIA TIONS, AND THE BLOCKS WILL BE SEOU'cWTI ALLY LOCATED IN THL 
STORAGE DOMAIN. AS A RUN PROGRESSES, BLOCKS MAY BE ' INAC T IVA TED AND NEW ONES 
ACTIV/ATED, SO THE GENERAL STATE OF THE STORAGE DOMAIN IS A MIXTURE OF. AC^flVE AND 
INACTIVE BLOCKS, EACH ACTIVE BLOCK IN THE STORAGE DOMAIN IS LABEl.^D BY A 
CODEWORD, WiilCH HAY ITSELF BE A WORD IN AN 'ACTIVE BLOCK Of^ XODEWORO, . ALREADY 
THE jlSTORAGE CONTROL OPERATIONS OF BLUCK CRbATlD.^ TO' FORM ARRAYS AND FREEING OF 
ARRAV^S HAVE BEEN MENTIONED. THE IMPLEMENTATION OF THf RICE COMPUTER PROVIDES A 
IN PRIMARY STORAGE WHICH IS iMf^?:DI ATELV APPLICABLE THROUGH A 
STORAGE DEVICES; THE DESC^iPTIVE PROPERIIFS OF CODEWORDS, THE 
ARRAY STORAGE, AND THE PROTECTION POTENTIAL I'^ THE SYStfcM ALLOW 
STORAGE DRGANXZATION" TO BE APPLIED , IN A* MLLT I PROGRAMM ING 
AN niTERRUPT WOULD ALLOW INTERVENTION FOR RETP.IEVAL. STRUCTURED 
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These improvements resuLt in a reduction in length from ♦136 of 
the original article to .135 ^ud an improvement of the data coefficient 
from 1*094 to 1*106. 'The abstract also seems to be more cohesive and 
unified without 'as many i,aort, disjoint* sentences/ 

The third example illustrates that in some cases the application of 

the improvement procedures will not increase the value of th.e data « 

coefficient, but the value of the data coefficient will not decrease. 

The third sample abstract, shown in Figure 5.8, is of the article, 

"Mini-computers Turn Classic" which appeared in Data Processing . .Only 

one rule, the combinp.tion of sentences with identical, headwords" of the 

'subject noun phrases and identical main verb phrases by means of a 

cjordin;2te conjunction, was applicable. This rule was applied to the 

Last tvo sentences of the abstract which are 

'individual manufacturers offer ALGOL, BASIC, and FOCAL 
compilers.* 

*Cost manufacturers offer programming support on an • « 

individually negotiated contract bas*is.* 

to yield the sentence 

'individual and cost manufacturers offer ALGOL, BASIC, and 
FOGAI compilers, and programming support* on an individually 
negotiated contract basis, respectively.' 

The resulting sentence is not any shorter than the two original sentences 

hence there is no reduction in length. There is no reduction in content 

so the data coefficient remains the same. Th-2 modified abstract is 

shown in Figure 5.9. 

These examples serve to illustrate that the application of 

procedures to modify the sentences of the abstract by structural analysis 



*f|MCWPUT€«S IWH CiASSlC* ff .J.J-8ARTIK:0ATA PROCESSING. 12< 1) t ^2-50« 1970) . 
TmH^MWAClVtKtKS EV£H O/FER CENTRAL PROCGS^RS WITH NO MEMORY WHATSOEVER. DATA 
IS TR,4N3FS^CD BSTWEEN'^ KEHORt AND THE CENTRAL PROCESSOR VjA A MEMORY faUS. THE 
iHXlHt COKE AOORESSABU VU INDEXING OR INDIREST ADDRESSING AND GENERALLY 
iH?\iff OUfPOT ?<IHICOHPOT€RS • liiCLUDE A PROGRAMED PARTY LINE I/O CHANNEL. THE 
WU CK^WEL IS one WoaO WIDE,. EIGHT fflTS FOR AN B-BIT PROCESSOR AND lo BITS FOR 
A. PHOCESSpS^^ "SLOW ^PEEa, CHARACTER-ORIENTED DEVICES ALSO INTERFACE TO 

IhE C?UW<£L FOR WTA r^^ANSFERS,^ AND EACH TRANSFER IS UNDER PROGRAM CONTROL. THE 
^UKK m^HSfEK QfTiW OCIES ALLOW RELAflVELY HIGH-SPEED DEVICES TO INTERFACE TO 
im Cm*^£L FOR DATA TftA-WSFERS. A NUM3ER OF MANUFACTU^^fcfes ALSO OFFER A 
CeH£>Ut-'PWOS& INTERFACE TO THE-PROCRAHEO I/O CHANNEL TO PROVIDE FOR ADDRESSING 

cd^rROLti« ^pECUt-fuRPo^e i/o devices and for transferring data. 

W5t «miCQ»?UTEBS CAN SUPPORT AN OPnONAL DIRECT MEMORY ACCESS CHANNEL. 

tHOint^V^L KA*<UFACtURERS OFFER ALGOL, BASIC, AND FOCAL COMPILERS. 

CSSf HAitlfF^^CTURERS OFFER PROCRAMHINO SUPPORT - ON AN INDIVIDUALLY NEGOTIATED 
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Figure 5.8 Computer-produced abstract of '"Mini-computers Turn Classic". 
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MINICOHPUTERS TURN CLASSIC. » J.J.BARTlKrOATA PROCESSING. 121,1) . 42-5011970). 
THOMAMUFACTURERS EVEN OFFER CENTRAL PROCESSORS HITH NO HEHORY WHATSOEVER. DATA 
IS TRANSFERRED BETWEEN HEHORY AND THd CENTRAL PROCESSOR VIA A HEMORY BUS. THE 
ENTIRE CORE .IS ADDRESSABLE VIA- INDEXING OR INDIREST ADDRESSING AND GENERALLY 
INPUT/ OUTPUT HINICOHPUTERS INCLUDE .A PROGRAHEO PARTY LINE I/O CHANNEL. THE 
DATA CHANNEL IS ONE WORD WIDE, EIGHT BITS FOR AN 8-BIT PROCESSOR AND 16 BITS FOR 
.A 16-BIT PROCESSOR. SLOW SPEED, CHARACTER-ORIFNTED DEVICES ALSO INTERFACE TO 
THE CHANNEL FOR DATA TRANSFERS, AND EACH TRANSFER IS UNDER PROGRAH CONTROL. THE 
BLOCK TRANSFER OPTION DOES ALLOW RELATIVELY HIGH-SPESD DEVICES TO INTERFACE TO 
THE tHANNEi. FOR DATA TRANSFERS. A NUHBER OF HANUFACTURERS ALSO OFFER A 
GENERAL-PURPOSE INTERFACE TO THE PROGRAHED I/O^CHA^ihlEL TO PROVIDE FOR ADDRESSING 
AND COMTKOLLIMG SPECIAL-PURPOSE WO DEVICES AND FOR TRANSFERRING DATA. 
HOST HINICOHPUTERS CAN SUPPORT AS OPTIONAL DIRECi MEHORY ACCESS CHANNEL. 
INDIVIDUAL AND COST HANUFACTUReJTS OFFER ALGOL, BASIC, AND FOCAL COHPILERS AND 
PROCRAHHING SUPPORT ON AN INDIVIDUALLY NEGOTIATED CANTRACT BASIS, RESPECTIVELY. 
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, can improve the quality of the abstracts and this improvement -will 
result in the same or increased value oF the data coefficient. These 
.procedures c'an be applied to all abstracts and although the -improvement 
may., be more noticeable in some abstracts than others, the,, quality will 
not be lowered in any example. This set of rules is presently not 
sufficient to improve abstracts so tha't, the data coefficient is- above 
1.0 for all examples. Therefore additional rules ^^^tio uld be added to the 
modification procedure. 

2. Improvjement of the System Implementation 

The abstracting system, ADAM, is, at present, capable of producing 
abstracts from the input of a complete document.- ADAM can be improved 
by improving the quality of ^he abstracts (see Section 1) and by improv- 
ing the implementation of the abstracting algorithms. The manner in 
which the algorithms are programmed and the efficiency of operation of 
•the programs wiir detj^^ine the feasibility of actual use of ADAM in 
an operational environment. The system improvements discussed in 
^ Section 2 are aimed at efforts to make AJDAM operate as efficiently as 
possible and to make it competitive with existing abstracting prcfcedures , 

2 . 1% Modification of the Word Control List 

A^ has already been pointed out, ADAM consists of two basic 
components, the abstracting rules and' the Word Control List (WCL). The 
WCL is a list of words' and word strings with which are associated codes 
indicating the semantic and/or syntactic role each plays*^ The, dicotomy 
of rule and WCL is an important system design parameter. By making the 
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WCL act as an interface between the text and the abstracting rules, 
processing complexity is reduced, efficiency is improved and consider- 
able flexibility is gained in tl>e control one has over the way in which 
the abstracting system operates. The abstracting rules deal pnly.with^ 
metalinguistic (nonterminal)^ symbols. The WCL supplies these symbols. 
Consequently, if we can assume that the rules are adequate, the WCL v 
will determine the exact nature aijd content of an abstract produced by 
the system. It is therefore very important to know.sas .precisely as 
possible what entries should be in the WCL^ what cod'es'should be 
associated, with each entry, how frequently each .entry , is actually applied 
in the processing of an "average" document and so on. In* short, it is 
necessary, both for effective and efficient abstract production, to 

determine* goodness criteria for the WCL and to use these criteria 'as a 

> • . 

basis for optimizing the WCL. 

•Recalling that the design of ADAM^ is based on the notion that- 

expressions signifying low information content are more nearly constant ^ 

• * ■ ' * . • .]■ 

and aire therefore more easily identified. It is reac^ily understood why 
it is important' to construct the WCL so that it incorpor-ates as high a 

» 

concentration of these well-used expressions as possible. To develop 
such a WCL, several ^aspects of the existing WCL should be studied. These 
aspects include the determination of the frequency of use of each entry 
in-the existing WCL, an analysis of- abstracts derived from a set of 
documents to determine additional WCL entries which might profitably 
be used, and a careful determinatioji of the proper semantic and 
syntactic roles to be associated wtih each WCL entry. These studies 
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should also be designed to answer the question of whether additional 
codes should be associated with WCL entries. Finally, a procedure 
should be implemented through which thfe WCL may be -modified as dictated 
by any particular application of the abstracting system. 

2.2. Use of the Word Control List to Control the Subject Orientation 
of Abstracts , . ^ 

As initially, designed, ADAM was yiewed as being able to produce 

abstracts with no special regard for the subject .area of the original 

document. While I still" hold this view, it must be recognized that it 

is clearly possible to produce abstracts slanted toward some particular 

subject area, for some particular purpose. Jn ADAM this control 

over the subject orientation of^ the .produced abstract ^ 

could be gained through manipulation of the WCL . And such 

manipulation-can.be done without any modification of the programs rof 

« 

the system. 

Modifications of the WCL would fall into one or the other of two 
classes: 1) addition of words common to a particular subject area, 
which thus serve as function words, and 2) addition of special entries 
with semantic code (see Table 3.1), which would cause sentences 
containing these entries always to be selected .for the abstract. Such 
modifications would have the effect of providing to the reader (or 
searcher) more specific data (£•£•> would produce informative rather 
than indicative, abstracts) and of indicating the. viewpoint of the 
abstracting system. 
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2.3. Analysis of Data Structures 

A data structure may be defined as an ordered set of data elements, 
together with some particular interpretation: The interpretation 
assigns to a particular data element in a particular position in the 
structure some particular function or meaningi A data structure may 
also be characterized in terms of the procedure which utilizes it. In 
the case of ADAM, several data structures are 
employed for - the .several basic' procedures which make up 
th'e system. Three important data structures utilized in the abstract- 
ing system tnay be identified. These are 

1. the structure associated with the input text, 

2. the data structure associated with the WCL and the allied 
matching process, and 

3. the data structure utilized in the application of the 
abstracting rules. 

The existing data .structures should be analyzedf in terms of their over- 
a,ll efficiency in the abstractin]g pr.ocess . Alternative data structures 
which may prove more efficient than those presently in use, should be 
considered. 

2.3.1. The Structure Associated with the Input Text 

In the case of the data structure associated with the input of 
the original document, ADAM utilizes a memory area of 
'fixed size for storing thy^ text during the entire process of producing 
an abstract from it. The storage allocated for this purpose (40,000 
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characters) is adequate for the storage of documents which contain, on 
the average, fewer than 5,000 text tokens. If the length of the 
document exceeds this number of tokens, ADAM prints an error message, 
skips over the (document and thus dc s not produce .art abstract for that 
document. It is clear that the existing data structure is therefore 
inefficient for several reasons. Jt is obviously inefficient in dealing 
with large documents, since such documents would have to be reduced in 
size and recycled through tl^e system. The data structure is also 
inefficient because storage is wasted for all. documents smaller than the 
maximum document size allowed. Since ^documents are of variable length, 
a better data structure might be one which provides just 1:he amount of - 
storage needed to ^tore a document (provided^ the total amount of storage 
available in t^o computer system is not ex^^^eaded). -But such a data 
structure requires a dynamic storage allocation mechanism, and the 
lluti* required to manage the storage allocation might offset any gain in 
efficiency of use of storage. Various methods * for " dynamically allocating 
•storage for the input document should be studied along with a comparison 
of the overall efficiency of thesje methods with the static storage 
allocation method currently employed in the' system. 

To exemplify the allocation of storage areas for document input, 
consider the following; The average document which the abstracting 
system will come into contact with would have a length equivalent to 
15,000 characters. On the basis of this estimate, the system would 
be designed so that this number of characters was allocated before 
document input was initiated. If, during the actual input step,' this 
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initial amount of storage was found to be insufficient, an additional . 
block of 1,000 characters' of storage would be allocated. If this 
additional space was not suffici*^nt to* hold the entire document, an 
additional 'block would be allocated. This process would be continued 
0 until tjje document had been Jread In its entirety, or until no more 

storage was available for allocation. Such a technique is. not, however, 
without its problems. Since each new block of storage allocated will 
in all probability not be continguous with the previously allocated 
block, a storage management routine would have to be provided to keep 
account of the storage addresses associated with each block and to 
provide a chaining mechanism for handling the blocks as an integral 
unit. Furthermore, .1000-character blocks -might still result in somewhat 
inefficient usage of the memory, but smaller blocks would increase the 
storage management problems. Thus it should be clear that optimizing 
the data structure for document storage requires careful study of 
alternative methods for storage allocation and management. 

2.3.2. The Structure Associated with the Word Control List and the 
. Matching Process 

In the prototype system, the Word Control List is stored 6a disk 

in alphabetical order and is read into core memory at the time ^ it is to 

be* compared with the text of the original document. Since the WCL is , 

in alphabetical order, the text must also be alphabetized. This is 

accomplished b^ means of a pointer sort using the method of quadratic 
> • * 

selection with exchange. Obviously if a way could be found of effecting 
the comparison of WCL and text without having to sort the text' then a 
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considerable saving in processing time would, be gained, 

I 

Another problem .associated with data ^representation in the WCL is 

« 

that each entry must match exactly a text word or word string or else 
-the entry is considered to be a complete mismatch. But many words in 
the text may ijave the same stem" (or root) as an entry in the WCL, but 
they arei encumbered with inflectipnal endings which cause them to appear 
• different from the WCL entry. To cope with this difficulty the WCL is 
now made to contain the various forms of the same word (as, for example, * 
UNUSUALLY and UNUSUAL). From other studies (9) it is known that it is 
possible in most instances to deal with inflectional forms l>y means of 
affix elimination techniques. But more recently, it has been shown 
that key generation techniques usually associated with hashing studies 
may provide ah ever more attractive solution to the word form problem . 
(10). Furthermore, such a method suggests a solution to the matching 
problem allifded to earlier. A text word could, by suitable means, be 
converted to a key (which would be used in the actual matching operation) 
which would be converted to an address pointing to that entry or set 
9f entries in the WCL with which the word might match. Matching* could 
then be. carried 'oufe on^the key only, or -first on the key and then the * 
entire word. Certainly, such data-controlled addressing has been used 
successfully in systems dealing with much larger data structures than 
that associated with the WCL (11). 
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2.3.3. The Structure Utilized in the Application of the. Abstracting 
Rules 

The third data structure of major importance in ADAM is that 
called the Table. The Table consists of entries corresponding to the 
words in the Input document. Each Table entry occupies 8 characters of 

storage and contains the address of each word in*the input document- 

■> 

the length of the word* an alphabetic pointer, and space for a semantic 
and a syntactic code. A -fixed amount of storage is allocated for the 
Table in advance of its creation, and every word in the input text has 
a corresponding Table entry. The reason for this is that the input 
text is compared with the WCL through use of the pointers contained in 
the Table. While the Table is an essential compionent of the abstracting 
system, it can be made more efficient, at least in terms of storage 
utilization. If a way can be found of effecting matching with the WCL 
without the necessity of alphabetizing the text, then the Table could 
be built to contain only those text words which matched WCL entries, 
together with , sentence and ,clause markers (periods and commas, mainly), 
without detriment to the operation of the abstracting procedures. One 
might also gain some additional reduction in storage requirements by 
representing the semantic and syntactic codes numerically rather than 
by character codes.* 

2.4. Design of the Semantic Module 

ADAM is programmed with several modules to perform specified 

1 

functions which are controlled by one main module. The abstracting 
rules which implement the functions specified in the* Word Control List 
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are programmed in the Semantic module. This module can be reprogrammed 

to incorporate, new abstracting rules without changing the rest "of the 

system. Although most minor mod.if ications can be made by changes in 

the Word Control List, I feel that two majdr changes should be 

incorporated int.o the Semantic module. 

First, the verb check within the Semantic module should be 

eliminated in favor of a verb check by means of structural analysis. 

The Semantic module currently has a- section of code which examines the 

syntactic codes of each of the words in each sentence in order to 

A 

locate a verb code. I€ the sentence contains at least one verb, then 

it is included in the abstract if it has met all other requirements for 

inclusion. If the sentence does not contain a verb then it is deleted*. 

This procedure guarantees that each sentence in the abstrac't will 

contain a verb. While it is clearly desirable that each sentence of 

the abstract contain a * verb-, this method poses two disadvantages. First, 

* < 

the size of the Word Control List must be increased because of the 
addition of all the verbs that are to be recognized. Since processing 
time increases with increases in the size of the dictionary,^ Xhia. cuts 
down on the system throughput. Second, many valid verbs are- unrecognized 
because they do npt match an entry in the Word Control List.^ These 
sentences j which might be valuable in the abstracts, are thus omitted. 
Since most sentences which are selected for the abstract will be taken 
directly from the document, it." is probably safe to assume that those 
sentences are well-formed and contain a verb. Thus, no verb check is 
needed in those cases. If a sentence is not well-formed, a simple 



214 

check for set of specified verbs will not always serve to eliminate 
the sentence. A better approach would be to incorporate in the sentence 
modification phase of the program (which is discussed in Section 1 of 

this Chapter) a test for well-formed sentences which all include verbs. 

' I 

This modification would allow greater numbers of verbs to be recognized 
as valid and would increase the efficiency of the system. 

Second, a semantic code for * super-delete'' should be added entirely 
within the Semantic module. This code would take precedence over the 
1 code (see Table 3.1), which indicated the highest priority of 
importance. The super-delete code would be assigr^gd to a sentence to 
indicate that it should hot be reinstated by any -sentence which 'refers 
to it. A super-delete code would be assigned to all questions, equations 
or direct quotations. A super-delefe code^would also be used *in the 
reinstatement procedure for intersentence reference. If a sentence 
requires an antecedent, then the previous sentence would normally be 
reinstated. If the previous sentence has bee'n' assigned a super-delete 
code, it would be impossible to reinstate the required^ antecedent 
sentence/ so then the sentence under examination would be marked with a 
super-delete code* This would be useful in preventing the inclusion 
of a sentence without an antecedent in the abstract and in preventing 
the reinstatement of that sentence if the following sentence also 
requires an antecedent. These two changes could be incorporated in the 
Semantic module and would improve both the performance nnd* the efficiency 
of the abstracting system* 
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3. Directions for Future Research ^ 

The research described in this dissertation could serve as the 
basis for addition ^1 research projects which are beyond the scope of 
this project. There are five areas for further research: large-scale 
testing, linguistic analysis, system implementation., adaptive and 
learning behavior, and inter-system competability . These five areas 
are .discussed in the next five sections. 

• / • . 

3.1. Large-Scale Testing 

The automatic abstracting system and evaluation criterion appear 
to be ready for a test of their practicability in an operational 
environment. The test should b^ constructed as a realistic procedure 
for producing'abstracts for an informati6\i system that currently 
^produces abstracts manually; The goals of the existing system should 
be carefully specified so that the data element can be defined in 
accordance with the use of. the abstracts produced. The cost of 
abstracting* a given, set of test documents should also be determined. * 
Cost and quality comparisons should be made based on a given set of 
documents which are representative of the total data base. 

The abstracting system should be implemented to produce abstracts 

as similar to the desired system goals as possible. This might mean 

t 

modification, addition, or deletion of entries in the Word Control List. 
•-The format of the output should be changed to conform as closely as 
possib'le to the format of the output used by the system. The 
evaluation criteria should be programmed to recogr >*e ,the defined data 
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element and make the daca content/ length comparisons. The test set 
of manually -produced abstracts should be available in machine-readable 
form for their evaluation by the same criterion. 

When all the parameters are carefully specified, the next step is 
to actually^^perfom the test. As with many experiments, the time 
needed to perform the test will be small compared to Che time of . 
preparation and evaluaticn of results. Careful records of time, memory 
requirements and cost should be made on the computer runs needed to , 
abstract the documents and to perform the evaluation. The quality of 
the computer-produced abstracts should be 'compared to the quality of 
the manually-produced abstracts and to some absolute standard in terms 
of the data concentration factor. The results of this evaluation should 
be used to improv*. the efficiency and operation of the abstracting 
system in areas of deficiency. The cost and- quality* factors could be 
used to study the feasibility of system implementation. 

The abstracting system may not give a high performance on its 

first large-scale test. Bernier states that beginning human indexers 

may not do too well on their first .experience, either. 

* "* » • 

Beginning indexers at Chemical AbStracqs often *had 50-75% 
of their index entries changed upon checking. Perhaps half 
of these changes were of a minor nature (e.^. , ^f para- 
phr<:sing, abbreviation, elimination of redundancy) and not 
of a major nature omissions, scattering, mistakes 

or errors). Had these indexers not received their B*S, or 
higher degrees in chemistry, then the pei'centage of changes 
would have been greater. ^(12) " . e» 

A beginning indexer must be trained by an evaluation of his work with 

suggestions for improvement. It is only reasonable to expect that the 
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<5bst:ract:ing system wi,ll have to be modified to reflect its experience.' 

3»2* Linguistic Analysis 

ADAH, as well as -other natnral language processing prograinj. can 

be improved with an increased awareness of* the fundamental nature of 

language. Most of the rules of the abstracting system are base.d on an 

■ ad hoc decermLnacion of what seemed reasonable... These rules could -.be ' 

substantiated cr denied by studies of the way- language is 'used to 

conanunicate ideas. For a ^iMf^n database, statistical studies could 

be n^de on the frequency of appearance in the^ text of entries- in the 

• • -* 

Woirld Control List. Statistical studies could also be made -on '.he 

number of times eacfj, rule in the semanq^ic module is applied. The 

results of these studies could be used to eliminate the rules that are 

infrequently applied and that decrease the efficiency of the abstracting 

system. The results could also be usi'd to indicate which rules process 

the most frequen>tly used expression's. - . 

The' absuracting system could be further improved by incorporating 
» * 

additional algorithms^ which indicate semantic and syntactic information. 

Input of both upper and lower case characters would allow an analysis 

of capitalization in each. -sentence . This would be* useful in identify- 

♦ 

ing 'proper names and sentence boundaries. Incorporation of analysis 
j5rogfams to identify phrase and clause boundaries might also be useful 
•In selecting portions ,of a sentence for inclusion in the abstract. 
,This, along with more sophisticated algorithms to determine the use .of * 
commas and periods, would^ enable the program to have ,more complete 
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data on the manner in which the ideas in the document are expressed, 

both in words and in structure. It woulS be helpful to be able to 

recognize synonyms in the text. This synonymity might be between 

singular aad plural forms of the same word or between words which are 

completely different in- spelling. For, example, it would be beneficial 

^ if the system could make use of the fact that^ "automatic abstracts", 

"automatically-produced abstracts", and "computer-produced abstracts" 

all refer to one idea. . " ' 

In many documents, one complete thought is written in more than 

one sentence. Identification of intersentence references is very 

•important in preserving the meaning and the coherency of the abstracts. 

Deiiermina^tion of intersentience references is dependent on both semantic 

and syntactic information from the document. With this information, it 

might be possible to create sentences for the abstract which combine 

. \ 

two sentences- from the document into a single sentence in the abstract 

The sentences of/'the abstract should also then come together to create 

a coherent abstract. - 
I 

3.3. S y s t'em Imp lementation ^ ."^ * ' 

. The future of automatic abstracting as a viable method of producing 
abstracts in an operational environment depends, in large measure on 
the reliability of the sys'tem. The abstracts produced should have a • 
data coefficient above the minimum level of acceptability in almost 
every case. A high degree of dependability is necessary before 
any manager would 6veh consider converging from a manual to a computer- 



based method for production of abstract's. . 

Perhaps a possible method for implementing a* computser-based system- 
would be to include human post-editing of all abstracts produced. This 
would remove obvious errors and. allow for maintenance of a quality 
standard. The initial systems that^ .convert .from manually-produced > to 
computer-produced abstracts might have to -retain all of their 'abstractors 
initially to edit the computer output. If tjie "editors' remarks could 
be used as feedback to the sysjtem-, the quality of the computer- produced 
abstracts c.ould be improved. ^ *. * 

3.4. ^ Adaptive and Learning Behavior 

Ideally ADAl-I- should be able to -learn from his mistakes and not be. 
a novice abstractor forever. The ability to improve will probably , , 
result" from the modification of the abstracting system- components by " 
tlie system designers. It would be desirable if ADAM were able to learn 
from his past experience and adapt his behavior \based on the data he 

currently processes. This adaptive b^ehavior might be based on a link 

C ' • ^ ^ 

with previo^us Experience through a memory bank. It might also include 

access to other information systems which provide document Services to 

... > <^ 

' •« . 

users. This abstracting system,^ with its ability to learn, ada^pt,- 

communicate, and remember could be truly described .as an intelligent 

system. ' ^ ' ^ , i 

This level of intelligence, and it 'is questionable if it is even ' 
' J* 

te 

desirable, certainly will be difficult, if not impossible to attain. 

«■ 

This is an example, of what Bar-Hillel calls the 80% fallacy. "The 
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remaining 20% will require not. one quarter of the effort spent, f or ,the' 
first 80%; but many, many times this effort, with a few percent remain- 
ing beypnd the reach of every conceivable effort." (13) My research ■ 
has been where the slope of the curve on the asymptotic approach to 
perfection. has been relatively small. Further research will require a 
much greater expenditure of effort for even mbdest gains. 

3,5. Inter-System Compatability 

As has been pointed out elsewhere (14, 15, 16)^, an Information 
Storage and I^et'rieval system acts as an interface between a "source (a 
scientist originating a document) and a receiver (a scientist seeking 

a document) 'as depicted in Figure 5.10. The operation of an Infoimation 

* • 1..' 

Storage and Retrieval system inv^olves a number of processes which are 

^ ' . - o 

effected in approximately chronological order. Let us consider* these 
main processes in this order. First, although an automatic abstracting 
system requires an entire document in computer- processable form, this 
fact should produce* no limitation on the operation of an Information 
Storage and Retrieval system, since more and more primary sources are 
being published using computer-based photo-composition (computer output 
on microfilm, COM) with the full text as a' by-product data base (17).^ 
An Information Storage and Retrieval system should be able to take 
full advantage of such data bases. Furthermore, optical character 
readers (OCR) are becoming technically more sound, operationally more 
general and reliable, and economically more attractive (18), so that 
original documents not in machine-readable form can be obtained in that 
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Figure 5,10 General communication system model , showing the role of the 
Infbrmatiop Storage and Retrieval System as ,an interface 
between source and receiver • Components of the ^information 
Storage and- Retrieval system are identified by broken lines. 
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form rather easily. 

The next step in the .process is ,the automated editing of the input 
text. In general, this editing involves the deletion of unwanted 
portions of the text (for instance, figures or other document attributes 
considered to be unsuitable in a given application), or the reorganiza- 
tion or restructuring of some document attributes (such as standardiza- 
tion of bibliographic citations). Onc^ the editing process is completed 
the document is ready -to be abstracted. Simultaneously with abstract.- 
.preparation, the references in the original document could be isolated 
as input to a citation indexing procedure. Citation indexes have; 
considerable value in themselves as search tools (19, 20). 

Once an abstract is available, it may be printed as part of an 
abstract bulletin and at the same time be used as the basis for ah 
automated indexing procedure. A variety of indexing methods is already 
available <21, 22, 23, 24) which could be'amployed to produce a Sprinted 
index from the. abstract^. A title index might also be produced ^is a 
smaller, less expensive, "throwaway" index. ^The index entries derived 
from the^ abstract could be used without further processing (free- 

vocabulary inde*^ing) or the index vocabulary - could be controlle.d auto- 

/ 

matically (as is done presently in at least one instance (25)). 

But having crea^ted an index to the document collection, using the 
abstract as a basis, does not obviate the use of the abstract as the 
basis for an automated search and retrieval system, either batch or on- 
line (26). Furthermore, 'the abstract could be augmented with index 
terms derived through use of a vocabulary-control procedure. And a 
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variety of formats for the data suggest themselves, which could enhance 
the effectiveness of a retrieval system under particular circumstances. 

Responses to a user ^inquiry to a retrieval system based on 
abstracts could take the form of the bibliographic citation, but a 
more "intelligent" system, possessing sucK a data base, could also 
provide parts of or whole abstracts as responses and, under suitable 
circumstances, parts or all of the original document could Be supplied 
as responses as well. 

A 

' THe central feature of the hypothetical Information Storage ^d 
Retrieval system^ (summarized in Figure 5.11) that I have described is 
the abstract. With an efficient means of producing effective abstracts 
the entire Information Storage ,and Retrieyal system becomes feasible. 
Automated abstracting methods constitute the sole missing component 
•of such a system. It is, therefore, one of the purposes of research', 
and development work on automatic abstracting to provide this missing 
component and thereby make such an Information Storage' and Retrieval 
system possible. * - . 

4. Conclusions 

4.1. The Design of an Automatic Abstracting System ^ 

J' 

The manual production of abstracts vs based on the ability of a 
trained abstractor to read a document, understand its contents, select 
certain key ideas, and rewrite these key ideas to form a coherent 
abstract. Figure 5.12 illustrates these basic steps in the manual 
.production of abstracts. Automatic abstracting systems are designed 
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Figure 5.11 Hypothetical Information Storage and Retrieval •system 
using abstracts^as the basic data source. 
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Figure 5.12 The basic s,teps in the manual production of abstracts. 
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to produce abstracts in a similar manner^ The system recei\^es an 
original document as* input. The system then selects certain key 
sentences from the document to be included in the extract. Figure 5.13 
shows the general character of all existing automa^tic abstracting 
systems^ The selection method of the , abstracting system described in 
this dissertation relies on the rejection of sentences which are 
unsuitable for incl'usion in the extract. Those sentences which are not 
-rejected are included in the final abstract. This method coincides 
with the intuitive idea that an abstract should*' help thd user by 
screening out those portions of a document which are not the most useful 
to him. This method also provides a practical means of implementing the 
process of abstracting* * 

4.2. The Improvement of an Automatic Abstracting System 

Comparison of Figures 5.12 and 5. 13" stiggests that nu important 
refinement of automatic abstracting systems would be that involving the 
development of procedures for the 'modification of the -form, arrangement, 
and content of the sentences selected for the abstract. This modifi- 
cation would produce abstracts in which the flow of ide^s was improved 
and which represented a more nearly coherent whole. This modification 
is based on a structural analysis and revision of each sentence in order 
to make the abstracts more acceptable\o the reader. The addition of . 
the modification procedure is shown in Figure 5.14. The generation 
and revision of sentences is as complex a procedure as the selection 
procedure, so^addition of this phrase will make the abstracting system 
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Figure 5.13 The basic steps ia the computer production of abstracts. 
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larger and more costly to operate. The revision of sentences app^rs 
to- be warranted because the resulting abstracts have a more readable, 
coherent stfyle. 

4.3. The ^Evaluation of. an Automatrc^Abstracting System 

The design of an automatic abstracting system must rely on some 

S t 

intuitive idea of what constitutes a good abstract.. This criterion of 
quality must be explicitly defined so that it may be used as a measure 
of the effectiveness'. of the. abstracti-ng system,- Inclusion of an 
evaluation procedure in the abstracting system is shown in Figure 5*15* 
The evaluation criterion that I have developed is based on the axiom 
that the best abstract among ^a set of .abstracts is that one which 
presents a maximum amount of data in the minimum amount of length. 
Since measuring the "information content" of a document presents a 
very difficult practical problem, a measure of data, which is Easier 
to implement, is formulated . Since information can be considered as 
data of value in^ dec isionl making, the amount of informdition' in a given 
abstract will always be less than or equal to the amount of data. This 
method of evaluating abstracts can »be adapted to many different -systems 
where abstracts are used by adapting the definition of data element to 
the goals ,o*f the particular system.* this method also provides an 
objective criterion for abstract evaluation. Almost all previous 
evaluation techniques relied on the opinions of people which would 
often not be uniform or consistent. Algorithmic improvement of 
automatic ab*stractihg systems using, only a subjective method of ^ 
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Figure 5.15 The addition of * an evaluation procedure to the computer 
production of abstracts. 
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evaluacion is almost impossible, so this development of an objective 
laechod is parcicularly important.- This dissertation has presented my 
research on an automatic abstracting system, its design, evaluation, 
and improvemenc. 
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Supplementary Bibliography fot: Computer-Based Abstracting 

In preparation for my research in comp\iter-based abstracting, I 
examined many publications reportirig the results of related research* 
I have cited many of these publications a!s direct references to portions 
of my dissertation* There are also many related publications which I 
have not cited directly* I have inci^uded a list of these publications 
here as a supplementary bibliography. ' 
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